The present invention relates to indexes of localities for geographic databases, and more particularly, to data structures in geographic databases used for indexing locality names and associated geographic features contained in the localities.
In recent years, consumers have been provided with a variety of devices and systems to enable them to locate specific street addresses on a digital map. These devices and systems are in the form of in-vehicle navigation systems that enable drivers to navigate over streets and roads, portable hand-held devices such as personal digital assistants (“PDAs”), personal navigation devices and cell phones that can do the same, and Internet applications in which users can generate maps showing desired locations. The common aspect in all of these and other types of devices and systems is a geographic database of geographic features and software to access and manipulate the geographic database in response to user inputs. Essentially, in all of these devices and systems a user can enter a target location and the returned result will be the position of the target location. Typically, users will enter an address, the name of a business, such as a restaurant, a city center, or a destination landmark, such as the Golden Gate Bridge, and then be returned the location of the requested place, or feature. The location may be shown on a map display, or may be used to calculate and display driving directions to the location, or used in other ways.
Typically, applications use top-down searching methods that search for the locality in which a desired geographic feature is located, then search for the geographic feature within that locality. Examples of geographic features that can be found in a locality are addresses, landmarks and business locations. Applications also use bottom-up searching methods that search for all geographic features matching certain criteria, then choose the desired geographic feature from the list of localities in which matching geographic features are located.
Currently, either geographic databases are not supplied with locality indexes or have locality indexes that are of limited functionality when searching for geographic features in localities. A locality index may be used to select a locality name and associated information to display to a user. A locality is, for example, a city or town within a state (US), province (Canada), county, or other principal geographic feature. For geographic databases currently having locality indexes, the indexes are basically lists of locality names, ordered by name source, with duplication of names between sources. Locality names can be found in many locality name sources, such as administrative, postal and colloquial sources. The term “locality name” in this application is used to refer to any datum that can be used as a locality description. Apart from the sources listed above, postal codes themselves can be used as locality names. Also telephone exchange numbers indicate locality in some countries and can be used as locality names. In Germany, license plate prefixes indicate locality and can be used as locality names. The following is a discussion of geographic database prior art regardless of whether or not a geographic database is supplied with a locality index.
Currently, a geographic database populated with locality information from various locality name sources will contain duplicate entries for a locality if the locality name appears in multiple locality name sources. The device or system manufacturers or applications developers either do not merge the duplicate localities to a unique set of names or do an incomplete merge due to differences in the representation of the duplicates across locality sources, such as spelling, punctuation, abbreviation or other differences between the duplicates. Thus, when a user then queries a geographic database application for a locality, the user's device or system may list the same locality name multiple times if the locality name appears in multiple locality name sources. This is confusing to the user who must choose between identical or nearly identical names displayed to the user's system or device screen. A further problem exists in the list of locality names if the user is unable to differentiate between actual duplicate localities and disjoint localities having the same or slightly variant names. The problem of duplicate locality names from multiple locality name sources is exacerbated in some navigation devices that have limited memory. For example, some devices can hold only two locality names per geographic feature. For a geographic feature associated with more than two locality names, any selection of two of the locality names to use in the device may be suboptimal because localities that are duplicate but disjoint and localities having more prevalent locality names may be missing from the selection. A missing duplicate disjoint locality can lead a user to pick an incorrect locality due to its apparent uniqueness in a list. For geographic databases having locality indexes, failure to merge duplicate localities also creates locality indexes that are unwieldy in size, especially for limited-memory navigation devices.
Currently, for localities having the same or slightly variant names that share the exact same geographic features, duplicate name entries are not eliminated from prior art locality indexes. For localities having the same or slightly variant names that share at least one geographic feature, the name entries are not merged into a single entry in prior art locality indexes. A geographic database populated with locality information from various locality name sources may contain slightly variant names for a locality if at least two of the different sources have slightly variant names for the locality. For example, Ho-Ho-Kus, N.J., is known by slightly different names in different sources, such as Ho-Ho-Kus, Ho Ho Kus or Ho-Ho-Kus (Hohokus). For prior art locality indexes, failure to eliminate geographic database entries having slightly variant locality names creates locality indexes that are unwieldy in size, especially for limited-memory navigation devices, and confusion for users trying to distinguish between these slightly different locality names. For duplicately named yet disjoint localities, the prior art currently distinguishes between the localities by displaying additional information, such as the county in which the locality is located. For these localities, nearby, well-known or prevalent cities displayed as additional information with the localities would be more helpful to a user because city names and locations are more likely to be recognizable to the user than county names in the US.
Further, in another example of locality definitions that are not treated consistently in common usage, certain geographic features in the state of New York are contained in the partially overlapping localities known in common usage as SoHo, Manhattan, and New York City. As mentioned above, New York City can be found in a Postal Place locality name source, and Manhattan can be found in an Incorporated Place locality name source. SoHo, on the other hand, cannot be found in a locality name source and is known colloquially. SoHo will be missing from a locality index based only on formal locality definitions.
Further, current geographic database locality indexes are not ordered by priority, or their importance for common usage. Further, for each geographical feature in a geographic database, localities associated with a geographic feature are not prioritized for the geographical feature. For a limited memory device that can store only a couple of locality names for each geographic feature, without prioritization of localities, an applications developer must choose a couple of locality names for a geographic feature associated with more than a couple of localities. Preferably, the highest priority localities associated with a geographic feature, or those localities that are the most well-known or most prevalent in common usage, would be displayed to a user's device. In presenting a list of localities to a user, the highest priority names associated with geographic features should be used since they will be the most recognizable.
Moreover, the most important name component, or primary token, of a locality name, such as “Hadley” in the name “South Hadley,” is not identified in some current geographic database locality indexes. When some currently commercially available navigation applications search for the city Hadley in Massachusetts, Hadley is retrieved, but South Hadley is not retrieved. To find South Hadley, the user has to begin with “S” and sort through many choices that begin with “South.”
A geographic database locality index is needed such that duplicate locality names and localities known by slightly variant names are merged, if and only if they represent the same locality, to eliminate confusion for a user who must otherwise choose between a list of identical or slightly variant names, especially for limited-memory devices. Such a locality index is also needed to reduce the size of the otherwise unwieldy index. While merging localities with duplicate and variant names, there is also a need to preserve meaningfully different locality names. A locality index is needed such that duplicate locality names that represent disjoint localities are distinguished. Otherwise, the user has no way to differentiate two different places with the same name. Further, a flexible locality index is needed such that formal locality definitions not treated consistently in common usage are accounted for, and such that the index is not based on these formal locality definitions. A locality index is needed that is ordered by locality priority for each geographical feature associated with multiple localities. Ordering by priority allows the most important names to be chosen to be included in limited memory applications and identifies the best name to present to the user. Finally, a locality index is needed such that the most important name component for a locality is part of the index to ensure that a search for the name component will return an expanded list of all relevant localities.
Generally described, a locality index is provided for use with electronic maps and electronic databases, as well as a method and system for creating the index.
Locality names from various locality name sources are associated with the geographic features for each geographic feature in a geographic database. Context-sensitive tokenizing, normalizing, optimizing and matching of locality names allows for eliminating and merging of duplicate and variant locality names, while preserving meaningfully different names. Duplicate locality names are eliminated, if and only if they represent the same locality, to reduce confusion for a user who must otherwise choose between a list of identical or similar names. Geographic database entries for localities known by slightly variant names are merged into a single entry if the localities share at least one geographic feature in common. Disjoint localities having duplicate or slightly variant locality names are distinguished by adorning them with the name of a nearby locality if and only if they represent different localities, again to reduce confusion for a user who must otherwise choose between a list of identical names, or names that are distinguished in ways that are less meaningful to the user, for example, by adorning with county names whose locations are not generally known to users.
A locality name table is created and includes the full name of the locality, the locality's primary token for indexing and other associated information, such as an adornment, city center information and size of the locality. A main source mask is created by allocating a bit for each locality name source used in the method. For each geographic feature in a feature locality priority table, a separate source mask is stored for each locality associated with the geographic feature, a bit set for each source in which the locality can be found. In this table are links to the locality name table and a priority for each locality associated with a geographic feature. The feature locality table also includes links to the find feature table, which includes associated geographic feature information for each geographic feature.
The locality names for each geographic feature are indexed in order of priority. In the preferred embodiment, the highest priority locality associated with a geographic feature is that found in a preferred postal name source, then priority of the remaining localities is determined by the number of bits set in each locality source mask. In such an index, a first locality has a higher priority than second locality if the first locality is more well-known or prevalent in common usage.
Ordering by priority allows the most important names to be chosen to be included in limited memory applications and identifies the best name to present to the user in a bottom-up search. The unwieldy size of the locality index that would have contained duplicate and slightly variant locality names is thus reduced. Further, the locality index takes into account locality definitions that are not treated consistently in common usage because the index is not based on these formal locality definitions. Finally, the most important name component for a locality from the tokenizing step is part of the index to ensure that a search for the name component will return an expanded list of all relevant localities.
In order to create a better locality index, a thorough list of locality names must first be created by gathering names from a variety of locality name sources, administrative, postal and colloquial locality name sources, among others. Using locality names from any number and type of sources allows for a universal schema for international data. Without this feature only a fixed number of sources may be used, such as postal or administrative name sources, potentially missing important names and constraining the types of sources that may be used in different countries.
Although the language used in this description is specific to the United States, in embodiments, the same principles can be applied internationally with only nominal adjustments. Examples of foreign locality name source equivalents include the Ordnance Survey and Royal Mail in the United Kingdom, and Stats Can and Canada Post in Canada.
In embodiments, for a given set of locality name sources, a list of locality names is taken from each locality name source. In embodiments, the sources are those containing localities in one or more selected states, territories, provinces, or districts, for example. In the preferred embodiment, the sources are those containing localities in the United States. In the United States, for example, sources of locality names include, but are not limited to:
1. Federal Information Processing Standards 55 (FIPS55). This component of the United States Geological Survey (USGS) TIGER database is in the public domain (http://geonames.usgs.gov/fips55.html). FIPS55 is a standard source describing locality structure for administrative localities as defined by the government, for example, codes for named populated places, primary county divisions, and other locations of the United States, Puerto Rico and the outlying areas.
2. United States Postal Service (USPS) City/State file. This file is a component of the USPS ZIP+4 product. These city and state names are found at the address range or ZIP code level. Five-digit ZIP codes and four-digit extensions (ZIP+4) are treated as locality names in an index and point to the appropriate set of names in the USPS City State File. While there is generally only one preferred postal locality name for each location, the postal service also includes any number of permissible and non-permissible postal locality names for the same location. A “preferred” postal locality name is the name the USPS recommends for use in addressing mail. A “permissible” postal locality name is an alias name which the USPS has approved and allows for mail delivery. A “non-permissible” postal locality name is one the USPS does not allow for mail delivery. In embodiments, the locality index will include all of the preferred and permissible postal locality names for each geographic feature.
3. Geographic Names Information System (GNIS) provided by the United States Geological Survey (USGS). This is a public domain database of locality names in the United States, including the fifty states and the territories. GNIS lists city names, their center points, their populations, and similar information.
4. Points of Interests (POIs) for City Centers.
5. POIs for USPS Post Offices.
6. United States Census Bureau's Topologically Integrated Geographic Encoding and Referencing system (TIGER) Record Type C for entity “P” (Incorporated places in TIGER).
7. TIGER Record Type C for entity “M” (County Subdivisions in TIGER).
Locality names that are wholly contained within a state can be associated with the state for indexing purposes. Localities that are not wholly contained within a state, such as certain zip codes in the United States, can be multiply indexed under their containing states.
In embodiments, the following use case example, as used by a user of a software application or device that accesses the geographic database, illustrates the benefits of using locality names from multiple sources to build an index. If only one source of names is used, important names are omitted. Postal names, administrative names, and even colloquial names are all important.
Without postal name sources in Index:
With postal name sources in Index:
Without administrative name sources in Index:
With administrative name sources in Index:
In embodiments, the following four use case examples show that another benefit of compiling locality names from multiple locality name sources is to differentiate between ambiguous street addresses within a locality. A city in the United States can have duplicate street addresses located in different parts of the city. This is especially true in large cities, such as Boston, Mass. As mentioned above, Boston can be found as a County Subdivision in the Administrative locality name source FIPS55. In embodiments, the first of these four use case examples shows a typical, non-problematic case of when a particular street address is unique within a city, there is no problem for navigation purposes, even if the city is large. An example of this is Newbury Street in Boston. This street name is ten blocks long and is not duplicated anywhere else in Boston:
With administrative name sources in Index:
At this point, the precise destination awaits more input from the user, such as a particular street number, the nearest intersection or the nearest block. When the input is supplied, a destination is pin-pointed on a map for the user:
In embodiments, the second of these four use case examples occurs when the street name is duplicated within a city, but the house number serves to make the destination unique. A long street that runs through several smaller towns within a large city is one such example. For example, Commonwealth Avenue runs through Boston, as well as smaller towns of Allston and Chestnut Hill within Boston. As mentioned above, Boston is a County Subdivision found in Administrative locality name source. Allston and Chestnut Hill are towns that can be found in Postal locality name sources under postal codes 02134 and 02467, respectively.
Without administrative name sources in Index:
Because Boston is not a legitimate postal name for postal code 02467 according to the U.S. Postal Service, “2000 Commonwealth Ave, Chestnut Hill, Mass. 02467” is not found in the above example for Boston even though Chestnut Hill is a small town within Boston.
With both administrative and postal name sources in Index:
At this point, Commonwealth Avenue is found to run through Boston, Allston and Chestnut Hill. The precise destination awaits more input from the user, such as a particular street number, the nearest intersection or the nearest block. When the input is supplied, a destination is pin-pointed on a map for the user:
In embodiments, the third of these four use case examples as illustrated in
Without postal name sources in Index:
With postal name sources in Index:
In this use case example, the application processes each user entry before requesting more information from the user. In other embodiments, for “With postal name sources in Index,” the user enters the city of Boston, the street of Adams Street, and a street number before the application processes these three entries. Assuming the street number is not duplicated in the small towns of Charlestown, Hyde Park, Roxbury and Dorchester, the street name and number will be found for one of these four towns and pin-pointed on a map to display to the user.
In embodiments, the fourth of these four use case examples shows that even street numbers, for example “2 Adams St.,” are duplicated on separate streets with the same name within a city. In this case, the only proper response is to present the user with a list of smaller towns in which the duplicates are located, in order to derive a unique destination. Thus, using the example from the third use case example above:
With administrative and postal names sources in Index:
In embodiments, in another use case example as illustrated in
Using this same use case example, in other embodiments, if the user enters the state, city and street name before the application processes the user entries, the application can determine the correct Brentwood. For example:
In embodiments, in a further use case example as illustrated in
Unfortunately, the Quechee locality name cannot be attached to the street address because the boundary of Quechee is not known. Instead, White River Junction is the designated locality for the street address. This choice is in accordance with Postal addresses. A navigation application can determine that it has found the desired location though use of the locality index, created as discussed below. Even though Quechee is not the locality for “1760 Quechee Main Street,” the locality index can expand the Quechee locality to locate the street in White River Junction, Vt. A navigation application can ask the user's confirmation when the matched locality differs from user input. Even though only one street has been found, it might be only a possible match, which the user of the navigation application could accept or decline. Map enhancements could make the right answer possible in the future with the addition of the boundary of Quechee. In that case, the name of the locality in which “1760 Quechee Main Street” is located will in fact be Quechee.
In embodiments, in a further use case example as illustrated in
Without names from various sources:
With names from various sources:
In this use case example, using names from various sources, an enhanced map could include the boundary of Greenwich Village.
Enter street-→Carmine Street
Enter street number-→13
Found: “13 Carmine Street, Greenwich Village, N.Y.”
In embodiments, in a further use case example as illustrated in
Found: “10700 70th Road, Forest Hills, N.Y.”
For this use case example, the locality index can also handle requests for the names of villages located in Queens:
Found: “10700 70th Road, Forest Hills, N.Y.”
For a given set of locality name sources from above and for a given proprietary geographic database, the process begins in step 805. If another locality name exists to process in step 810, in step 815, the process determines whether map matching is possible if the source contains geographic features that match those in the geographic database. If in step 815, map matching for the source is found to be possible, in step 820, map matching directly associates locality names from the locality name source with geographic features in the geographic database. Direct association can be performed automatically through conflation, or attribute matching, or manually by inspection. Direct association is typically used for locality name sources that share attributes with the geographic database. In the preferred embodiment, conflation can be used when the locality name source has spatial information attached to it indicating its location and extent on the earth. Direct association is made by overlaying localities from the locality name source spatially on the geographic database, assigning a locality to any geographic database features that occur within the boundary of that locality. Attribute matching is performed by matching common attributes between a source and the geographic database, which then allows a direct association to be made. Attributes that can be matched are those that can be represented by strings or numbers. Indirect association is typically used for the other sources.
In embodiments, in step 820 when the locality name sources shares attributes with the geographic database, a direct association to the geographic features in the geographic database is made by matching attributes in the source against the same attributes in the map or geographic database. For example, range-matching can be used to match address attributes between a locality source and the geographic database. Range-matching can be done using any source that has locality names associated with street detail, including TIGER, and the USPS City Place Names directory. County Subdivision (entity “M”) and Incorporated Place (entity “P”) codes are directly propagated from the matched TIGER geographic features onto the geographic features in the map or database of interest. Range-matching takes a street name, range of house numbers, and locality from TIGER and tries to match these items to a corresponding street segment in the proprietary geographic database of interest. In TIGER, each side of a street block not only has address range, it has tags representing the entity type P (incorporated place name) in that location, the entity type M (county subdivision name) in that location, a state code, a block code, a tract code, as well as Minor Civil Division (MCD). Ranges that match make it possible to transfer information from TIGER onto the geographic database. A range match can be either an exact match of street segments, street segments that touch or are exactly aligned, or street segments that partially overlap.
In step 820, where USPS City/State File is the locality name source, the deliverable address ranges from the source's USPS ZIP+4 catalog are geocoded against the map or database. In embodiments, ZIP codes from this source are treated as locality names themselves. ZIP codes from this source also point to the appropriate set of locality names in the City/State file. For each successful match, the five-digit ZIP code and one four-digit plus4 code from the ZIP+4 is treated as a locality name and are propagated onto the corresponding geographic feature.
In step 825, for geographic features in a geographic database that were not matched to the locality name source, face voting is used to match the geographic features with other features in the geographic database, thereby inheriting locality assignments from the matched features.
For example, in
In embodiments, face voting can be used for other geographic features besides city block faces, such as street segment sides or road edges. In embodiments, face voting can be used for two or more other street segment sides besides the street segment associated with an unknown city name. In embodiments, face voting may also be used where two or more of the block faces are associated with unknown city names. In this case, a majority vote is taken from the remaining block faces, and either a majority vote or a tie is found and handled as discussed above. In embodiments, face voting may be used to associate the block faces with other locality names besides cities or towns. For example, locality names in the USPS City/State File are the five-digit ZIP code and one four-digit building code from the ZIP+4 file.
Other embodiments of face voting include a weighted vote or a linear length vote instead of a majority vote. In embodiments using a weighted vote, certain block faces adjacent to a block face not associated with a locality are given preference, or weighted more heavily in the voting process. A weighted vote could have any weighting component that measures the confidence of the adjacent block face assignments. For example, preference might be given to block faces corresponding to major streets or that are located in larger regions. Length of the block faces is another such weighting. In embodiments using a linear length vote, for a given block face not associated with a locality, for each known locality associated with block faces adjacent to the given block face, the total length of the block faces is taken to determine which locality associated with the adjacent block faces has block faces of the longest total linear length. This resulting locality is then assigned to the given block face not associated with a locality.
In
Locality names taken from the various locality name sources are tokenized, normalized, optimized and/or matched, merged, or adorned to eliminate duplicate and variant locality names, in embodiments. In the preferred embodiment, all the steps of tokenizing, normalizing, optimizing, matching, and merging or adorning are performed. This process reduces the number of locality names for each locality that has two or more similar names, while also preserving locality names that are meaningfully different. These steps accommodate differences in name encoding between the various sources. One example of similar locality names from various sources is the city of Ho-Ho-Kus, N.J., which appears as follows in various locality name sources:
GNIS: Ho-Ho-Kus
From steps 825 and 865 in
Tokenizing locality names from the first two locality name sources listed above for the Ho-Ho-Kus, N.J. example produces the following body and suffix tokens:
Tokenization is helpful to isolate those components that define a unique name and by association, those tokens that can be ignored in the matching process. Most end users will desire that “Rutland” match “Rutland Township,” that is, that the term “Township” be treated as insignificant. At the same time, most end users will desire that “Boston” not match “South Boston,” that is, that the term “South” be treated as significant. Another reason for tokenization is to offer a software applications developer flexibility in presenting locality names to the end user because the significant portion of the name will be indexed. For example, by tokenizing “Hollywood” and “West Hollywood,” both will be presented as selection choices to a end user who enters a map search for “Hollywood.” This occurs because the “Body” token for both will be “Hollywood,” as West Hollywood will be tokenized as Body: Hollywood, Prefix: West, and Hollywood will be tokenized as Body: Hollywood.
In another embodiment, tokenization helps to determine the correct expansion of context-sensitive abbreviations. For example, a locality prefix token “St.” most likely refers to “Saint,” whereas a locality suffix token “St.” most likely refers to “State.”
The following are other types of tokens and examples of those tokens:
In step 835 of
The following use case example illustrates the benefits of the tokenizing and normalizing features that can be stored in the locality index, the creation of which is discussed below. Without these features in the index, variant abbreviations appear as different city names. With these features in the index, abbreviations are put into a common form, allowing the applications developer to collapse the list into a single unambiguous entry. Although capitalization of tokens is normalized to consistent upper case to facilitate matching, tokens are typically presented to the user with only the first letter of each token capitalized.
Without tokenized and normalized locality names in the Index:
With tokenized and normalized locality names in the Index:
The following use case example illustrates the benefits of tokenizing and normalizing directional tokens in locality names. By identifying directional tokens, locality names can be indexed by their body, rather than by directional. After directionals are normalized, an applications developer only needs to check for normalized tokens but not any abbreviations of those tokens.
Without tokenized and normalized locality names in the Index:
With tokenized and normalized locality names in the Index:
In step 840 of
In step 845 of
In step 850, the next bit position in the source mask for each locality name in the source is set to this source. Names that appear in multiple sources will have bits set in the mask for each source in which they appear. For example, the name “Boston” is simultaneously a county subdivision name, an administrative place and the preferred postal name for a number of ZIP codes. Names that do not appear in multiple sources will have only a single bit set in their mask corresponding to their source. The process loops back to step 810 to process the next locality name source if one exists.
If in step 810 of
Many different algorithms are possible for name matching. Examples of name-matching techniques include context-sensitive matching, phonetic matching and Soundex. Context-sensitive matching is string matching of the names or matching of the spelling of names. This type of matching is performed with knowledge of which tokens are being matched that allows for special rules. For example, in the body token, a good context-sensitive matching algorithm can match “John F. Kennedy” and “John Fitzgerald Kennedy.” An excellent context-sensitive matching algorithm can match “MLK” and “Martin Luther King.” Phonetic matching, on the other hand, matches the sounds of words as opposed to the spelling of the words. For example, “fish” and “phish” match phonetically. For name matching in various languages, different phonetic matching algorithms can be used. Soundex is a phonetic algorithm for indexing names by their sound when pronounced in English. The basic aim is for names with the same pronunciation to be encoded to the same string so that matching can occur despite minor differences in spelling. More detailed information regarding phonetic algorithms can be found in application Ser. No. 11/377,764, filed Mar. 16, 2006, entitled “Geographic Feature Name Reduction Using Phonetic Algorithms” to Jesse Sheridan.
In embodiments, in order for two full names to match, the strings must match exactly. If full names do not match, in embodiments, a match of body tokens is attempted. Body tokens must match and direction and type tokens must also match for a successful token match. Thus, matching of the tokens may not start with one or both leading tokens, and one token must be a leading substring of the other. Thus, matching tokens must also ignore certain tokens. In embodiments, minor spelling variations can be allowed between two matching names. In embodiments, name matching is implemented fairly conservatively in order to prevent false matches. Thus:
In step 870 of
If in step 872 of
If in step 873 of
In embodiments, merging of locality names only occurs when the overlapping localities have no non-overlapping features that can not be distinguished from each other. For example, if Randolph and Randolph Center both have a Main Street with no overlapping street numbers, the two towns can be merged. If both towns have a “2 Main Street” for example, however, the towns should not be merged.
The following use case example illustrates the benefit of eliminating all but one of the duplicate locality names from multiple sources that have overlapping geometry. Without this feature, a locality name is multiply listed in choices presented to the user.
Without eliminating duplicates:
After eliminating duplicates:
The following use case example also illustrates the benefit of merging localities having slightly different names. Without merging, the user may not know which slightly different name is the locality in which a desired destination is located. With merging, the user does not need to distinguish between names. For example, the localities “Randolph,” “Randolph Center” and “Randolph Township” overlap, and thus are merged into a common area, represented by the single name “Randolph.” Thus for a user search:
Without merging:
With merging:
In step 876 of
In step 872 of
The following use case example shows adornments for disjoint localities having duplicate or slightly variant names:
Adorning with county names:
Adorning with large, nearby, easily recognizable city names:
In this use case example, the application processes each user entry before requesting more information from the user. In other embodiments, for “Adorning with large, nearby, easily recognizable city names,” if the user enters the state, city and street name before the application processes these three user entries, a unique destination can be determined if the street address is found in only one of the choices. For example:
Adorning with large, nearby, easily recognizable city names:
If in step 870, another set of matched locality names does not exist, then in step 880 of
In embodiments, priority order of the localities associated with a geographic feature is based on prevalence of each locality name in common usage for an intended application. In embodiments, prioritization based on common usage allows the locality names to be ordered differently for different users. In the example of overlapping localities such as “New York City,” “Manhattan” and “SoHo,” in common usage, a local user would know the area well would most likely use the more specific of the three localities, or “SoHo.” If an application is intended for this local user, the highest priority locality name would most likely be the one having the least number of sources in which the locality name can be found. Thus, the order of priority from highest to lowest would be “SoHo,” “Manhattan,” then “New York City.”
Using the same example of overlapping localities in New York City, in common usage, a non-local user who does not know the local area well, however, would most likely use the more well-known, easily recognizable locality. If an application is intended for this non-local user, the highest priority locality name would most likely be the one having the most number of sources in which the locality name can be found. Thus, the order of priority from highest to lowest would be “New York City,” “Manhattan,” then “SoHo.”
In embodiments, algorithms for determining priority order in an application can be applied differently to meet different common usages for a user. For example, for a local user navigating within a locality such as a large city, the user might want a priority of locality names based on common usage for a local user. While the same user navigates to the same large city from afar, however, the user might want a different priority based on common usage for a non-local user. Once the user reaches the large city and crosses the boundary into the city, however, the user might want the priority to change back to that of a local user.
Many different priority ordering schemes are possible. In the preferred embodiment, the highest priority locality associated with a geographic feature is that found in a preferred postal name source, then priority of the remaining localities is determined by the number of bits set in each locality source mask. In embodiments, a first locality has a higher priority than second locality if the first locality is more well-known or prevalent in common usage. In embodiments, the priority of a locality name is determined by the number of sources in which the name can be found. The locality name for a geographic feature with the highest priority is the locality name that can be found in the most number of sources, and thus, that has the most bits set in its source mask. Priority order of the locality names for a geographic feature is from highest to lowest.
In embodiments, an applications developer can also use the source mask to override this default priority scheme by preferring certain locality name sources over others. In other embodiments, priority is defined in terms of the largest physical locality size or largest locality population. In other embodiments, priority is defined as the largest number of geographic features, for example street segments, in a locality. Priority can also be defined in terms of the largest number of major geographic features located within the locality, as opposed to the number of geographic features located within the locality, in other embodiments. An example of a major geographic feature is an important highway. In embodiments, priority can be defined using the locality source masks to determine a preference of certain locality name sources over others. In embodiments, an applications developer can use locality names from locality sources indicated as “Trump” in
In embodiments, in the case of locality priority ties, a primary sort is performed using one of the above schemes, and where necessary, by a secondary sort based on one of the above schemes. In the preferred embodiment, a primary sort is performed on the number of sources from highest to lowest in which each locality can be found. A secondary sort is based, for example, on the number of geographic features, or street segments, from highest to lowest contained in each locality.
The process of
The variable format of the locality index allows any number of table entries to be included for each geographic feature in the Feature Locality Priority table. This is especially important in North America for postal names. While there is generally only one preferred postal locality name for each location, the postal service also includes any number of permissible postal locality names for the same location. The locality index includes all of the preferred and permissible postal names for each geographic feature.
In embodiments, the Locality Name table of
The NAME_LC field is a three character code for the language of the locality name. In embodiments, NAME_LC is set for each locality name to indicate the native language of the name to support multi-lingual countries. In embodiments, NAME_LC can be any number of characters. LOC_SIZE indicates a count of the number of geographic features associated with this locality name and can be used by applications developers to override the default PRIORITY scheme supplied in the Feature Locality Priority table. COUNTRY is a country code and is a three character abbreviation of the country in which the locality is located. In embodiments, COUNTRY can be a standard country code such as ISO 3166-1, which is part of the ISO 3166 standard first published by the International Organization for Standardization. In embodiments, COUNTRY can be any number of characters. CENTER_ID is a link to city center point features found elsewhere in the geographic database for this locality. In embodiments, these city center point features are the locality center point latitude and longitude coordinates, as well as a street segment corresponding to the city center. City centers provide a point within a locality to a user when a specific street address is not requested or cannot be found.
In embodiments, the Locality Name table of
In embodiments, the Find Feature table of
In embodiments, the locality index is provided in multiple formats, including international formats, to enable easy integration with proprietary geographic databases. The locality index is provided to accommodate data from any country. While the format is generalized, the content is tailored to include specific locality sources and types appropriate in each country. A proprietary application provides the correct pronunciation for each locality name.
In embodiments, for locality index table usage, in a top-down implementation of finding an address, the locality is resolved first, and then the correct geographic feature is found within the locality. A navigation application will first perform name matching to find the desired locality name in the Locality Name table. Once the locality is found, the Feature Locality Priority table is searched using the NAME_ID of the chosen locality to determine the geographic features contained in that locality. The FF_IDs of those features are used as an index into the Find Feature table to retrieve information about those features needed to find a particular feature, such as street names and address ranges in the case of street segments, and then matching is performed to select the desired specific geographic feature. For example, [Enter City-→Boston]. “Boston” is matched to the names in the Locality Names Table, returning the NAME_ID for “Boston.” [Enter Street-→Adams]. The Feature Locality Priority Table is searched for a list of FF_IDs whose NAME_ID is the NAME_ID for “Boston.” The Find Feature Table is searched for the FEAT_ID that points to “Adams” in the geographic database. Subsequently, the desired house number can be requested from the user and the Find Feature Table is searched for the FEAT_ID that points to the address range containing the requested house number in the geographic database. The Find Feature Table could be searched for the FEAT_ID that points to the latitude and longitude point for this feature in the geographic database, in order to display to the user the location of the feature on a navigation application or device, for example. For improved performance, the locality index will often be pre-compiled to eliminate many of these indirect references.
In embodiments, for locality index usage, in a bottom-up implementation of finding addresses, a list of target geographic features is chosen first, then the correct feature is selected by resolving the desired locality from the list of all localities containing a feature by that name. A navigation application will first perform matching to find a list of geographic features in the Find Feature table. The corresponding FF_IDs from the Find Feature table are then used as indexes into the Feature Locality Priority table. The entries in the Priority table for these FF_IDs can then be scanned for a NAME_ID whose name in the Locality Name table matches the desired locality. If the applications developer wishes to present locality choices to the user, the application should consider the locality NAME_IDs in priority order, choosing the highest priority locality names that are unique for the FF_IDs under consideration. These names can then be presented to the user from which to choose. As in the top-down case, the locality index will often be pre-compiled to eliminate many of the indirect references between tables.
In embodiments, the locality index can be used to find named places such as points of interest and landmarks. Lists of such places are first associated with street segments from the proprietary geographic database. The application will then match the name of the desired point of interest or landmark to find the street segment. The application then uses the implementation of finding addresses above using the street segment in order to determine the correct locality.
In embodiments, the locality index can be used to find a city center. An application will name match the desired locality using FULL_NAME and NAME_KEY in the Locality Name table to find the correct entry in the table. Once the correct entry is found, the CENTER_ID field is used to find the corresponding proprietary locality center information in the geographic database, such as latitude and longitude coordinates or the street segment corresponding to the city center.
In embodiments, the locality index can be used to disambiguate locality with duplicate names, but distinct geography. An application will name match the desired locality using FULL_NAME and NAME_KEY in the Locality Name table to find the correct entry in the table. For example, if the locality is “Brentwood, Calif.,” two matches will be found as shown in
In embodiments, the locality index can be used to resolve ambiguity in address features. For example, for the “2 Adams Street” example in
In embodiments, the locality index can be used to search neighboring areas for a requested feature in a top-down application. In some cases a desired feature may not be found in a locality specified by a user and the navigation application will wish to expand the search to neighboring or larger containing localities. The application will first match the name of the desired locality in the Locality Name table, retrieving the corresponding NAME_ID. After determining that there are no FF_IDs corresponding to the requested feature in the Feature Locality Priority table with this locality NAME_ID, the application will find one or more FF_IDs in the Feature Locality Priority table that does contain this NAME_ID. The priority chain may be followed, either higher or lower priority, for these FF_IDs in the Feature Locality Priority table to retrieve other NAME_IDs corresponding to these FF_IDs. The Find Feature table can be consulted to determine if the requested address is within any of these other, related localities.
In embodiments, the following use case example illustrates the benefit of the prioritization feature of the locality index. Without prioritization, it is unclear to the applications developer how to use the most recognizable name when querying the user. In some places, postal names are the most common. In other areas, administrative names are well known. With the prioritization feature, the most common name can be chosen.
Without prioritization:
With prioritization:
In embodiments, in a further use case example as illustrated in
The navigation application would note an inconsistency here. The application will first search all FF_IDs in the Feature Locality Priority table where the NAME_ID points to Chicago. The application will note that “North Crawford Avenue” does not exist in Chicago. The application will search for all FF_IDs in the Feature Locality Priority table where the FF_ID points to “North Crawford Avenue.” The application will find “North Crawford Avenue” in the Chicago suburb of Lincolnwood. If the application had found “North Crawford Avenue” in several localities, the application would use the highest priority locality name for this FF_ID using PRIORITY in the Feature Locality Priority table. The application can note that “South Crawford Avenue” exists in Chicago. The application then requests the street number:
In this example, if the correct street number was found in both places, the application could offer the user a choice: “6600 South Crawford Avenue, Chicago” or “6600 North Crawford Avenue, Lincolnwood.” Since street number “6600” is not found on “South Crawford Avenue” in Chicago, this address choice is not displayed to the user. Even though the street number “6600” found for “North Crawford Avenue” is located in Lincolnwood and not in Chicago, the application can assume that is the address the user intended to request.
In embodiments, in a further use case example, the application can provide for handling whether one of a user's inputs for the street or for the city is inconsistent and should be fixed. The address for Chandler Music Hall on its website is “71-73 Main Street, Randolph, Vt.” In the city of Randolph, Main Street is divided into “North Main Street” and a “South Main Street.” “Main Street” also exists in the nearby town of Randolph Center. For the end user, if the street is really Main Street, then the Hall must be in Randolph Center. If the Hall is in Randolph, then it is located on North Main Street or on South Main Street. The Hall is actually located in Randolph, at 71 North Main Street. If an end user was using the website address in a top-down application, the user would correctly be led from Randolph to North or South Main Street, but the application would ask the user for a decision because street number 71 exists on both streets. If the user was using the website address in a bottom-up application, the user would incorrectly be led from Main Street to Randolph Center. In embodiments, one way for a navigation application to handle this kind of situation is to present all the choices to the user:
In embodiments, one or more steps of the present invention are carried out automatically. The automatic feature is implemented using appropriate software. The automatic feature creates a substantial increase in efficiency and speed with which locality indexes are created.
Embodiments of the present invention with modification can be applied to non-navigation applications and devices. For example, in a spatial Yellow Pages application, it is desirable to find all businesses of a certain type sorted by distance from a point. In embodiments, indexing localities for this type of application may use a priority scheme based on frequency of occurrence in a Yellow Pages directory.
As shown in
A geographic database 930 is shown as external storage to computing device or system 910, but the geographic database 930 in some instances may be the same storage as storage 916. In embodiments, locality name entries are merged for duplicate and variant localities 932 in geographic database 930. In embodiments, geographic database 930 contains a main source mask of locality sources 934. In embodiments, a locality index including Feature Locality Priority, Locality Name and Find Feature tables 936 is stored in the geographic database 930.
Proprietary geographic database creation software 940 can use real-world locality sources and definitions 960 to merge and/or adorn the duplicate and variant locality name entries 932, create the main source mask of locality sources 934 and create the locality index 936. Examples of real-world locality sources and definitions are described above in the discussion for
For an example of a geographic database-to-application converter and device application software 950 as used by a user on the Internet, or on a navigation device, the user can select a locality to be displayed on a map. Alternatively, if the user requests driving directions, for example, the locality can be either the starting or ending locality.
In embodiments, the type of software application that queries the user can be a drill-down, either top-down or bottom-up, application. The drill down approach is useful in automobile-based navigation systems with limited memory. In embodiments useful for limited memory devices, the applications developer can include in the device only locality names that rank high in priority. A top-down application first requests the user to enter a principal geographic feature, for example a state or province. The application then requests the user enter a locality, for example a city or town, located in the principal geographic feature. The application then requests the user to enter the name of the street in the locality. Finally, the application requests the user to enter the street number. In most cases, the queries result in specification of an unambiguous geographic database feature for use by an application, for example displaying the locality to the user on GUI 920 of display device 918. A bottom-up application first requests the user to enter a house number and street name. The application then displays all the localities in which such an address can be found. Finally, the application requests the user to choose or enter the name of the desired locality. The bottom-up methodology also usually results in specification of an unambiguous geographic database feature which can then be used by the application.
In embodiments, the application software can use the geographic database index in a drill-down application, which allows the end user to enter a partial or full locality name, usually within a given state. In embodiments, the application presents names to the end user that match the user's input, and the user chooses the best option. Matching against the tokenized name bodies, the application can present both “Hollywood” and “West Hollywood” when any of the first letters of “Hollywood” are input by the end user.
In other embodiments, the software application is not a drill-down application and instead queries the user for street number and street, locality and principal geographic feature at one time. In most cases, the query results in specification of an unambiguous geographic database feature, and the process returns the location to the user. If the user enters a street name of “Main Street” and a locality of “Springfield,” a duplicate locality “Springfield” will be found if it also has a street by the name of “Main Street.” If duplicate localities exist for the geographical feature, then a list of localities and their adornments can be displayed to the user in order to ask the user to choose one, such as on GUI 920 of display device 918. For an example pair of duplicate localities for “Washington, N.J.,” the two localities can be adorned with the counties in which they are found or with names of nearby larger cities. “Easton, N.J.” and “Hammonton, N.J.,” respectively, are nearby large cities of the two duplicate localities. Thus, “Washington (Easton), N.J.,” and “Washington (Hammonton), N.J.,” are displayed to the GUI 920 of
Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. Embodiments of the present invention may also be implemented by the preparation of application specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
Embodiments of the present invention can include a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of embodiments of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems, including molecular memory ICs, or any type of system or device suitable for storing instructions and/or data.
Stored on any one of the computer readable medium (media), embodiments of the present invention can include software for controlling both the hardware of the general purpose/specialized computer or microprocessor, and for enabling the computer or microprocessor to interact with a human user or other mechanism utilizing the results of embodiments of the present invention. Such software may include, but is not limited to, device drivers, operating systems, and user applications. Ultimately, such computer readable media further includes software for performing embodiments of the present invention, as described above.
Included in the programming or software of the general purpose/specialized computer or microprocessor are software modules for implementing the teachings of the present invention. Embodiments of the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit embodiments of the present invention to the precise forms disclosed. Many modifications and variations will be apparent to a practitioner skilled in the art. The embodiments were chosen and described in order to best explain the principles of the present invention and its practical application, thereby enabling others skilled in the art to understand the present invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the present invention be defined by the following claims and their equivalents.