This invention relates to a method of indexing data relating to geographical locations and a geographical location index produced by the method.
Indexing geographical location data which has a spatial component is becoming increasingly difficult as data volumes are increasing in size and service providers such as search engines and the like are attempting to organize geographical data for fast retrieval. For many applications dealing with geographical data, such as, but not limited to, “local” searches through the World Wide Web, the most important function is to query data concerning a given geographical point and then to return data ordered by its proximity to the given geographical point, starting with the data nearest to the given geographical point.
A number of spatial indexing technologies allow the querying of objects within a bounded rectangle or circle and will return all data within the requested area. Problems occur when the bounded area is too small and consequently too little (or no) data is retrieved. On the other hand, if the bounded area is too large, then too much data is retrieved. In the first case, the application has to extend the search area to find some or more relevant data and in the second case the volume of data is too great to be efficiently processed either by the application or the user. This means that the scalability in retrieving data is compromised.
There are algorithms that attempt to address this problem from a purely mathematical perspective but these are not discussed further.
The present invention seeks to provide an alternative method for spatial indexing of geographical data to allow results proximal to a target geographical location, i.e. determining from the target geographical location one or more nearby geographical locations or data concerning the nearby geographical locations. This allows the geographical data relating to the target geographical location or nearby geographical locations to be returned more easily. The method differs from traditional spatial indexing by making use of predefined geography (either real or artificial) to create data which can be used for indexing using standard (non-spatial) indexing technology. Hence, the method of the invention allows ready scaling of search results in a manner consistent with normal non-spatial indexing. The method of the invention can also be viewed as offering search results of increased relevance at a local level since it provides improvements in ranking based either on locating names or on geographical hierarchical information.
One aspect of the present invention provides a geographical location index comprising a plurality of layers of geographical information concerning a geographical area, each layer comprising a division of the geographical area into a plurality of discrete zones which each have a zone identifier and associated geographical co-ordinates of one or more geographical locations contained within that zone.
Preferably, each zone has a finite number of neighboring zones in the same layer.
Conveniently, each layer defines a different set of zones.
Advantageously, the zones in one layer represent a predetermined geographical area.
Preferably, the predefined geographical area in one layer is a country, in another layer is a state, in another layer is a county, in another layer is a postcode/zip code and in another layer is a building.
Conveniently, there is a hierarchy of layers having respective zones of diminishing area so that a top layer provides low resolution division of the geographical area a bottom layer provides high resolution division of the geographical area.
Advantageously, for a particular layer, a record of a subject zone contains a zone identifier for the subject zone, the zone identifiers of zones neighboring the subject zone, the or each zone identifier of a zone in another layer containing the subject zone and the zone identifiers of all zones contained within the subject zone.
Preferably, the zones of one layer do not overlap any other zones of the same layer.
Conveniently, a zone of one layer overlaps one or more zones of another layer.
Advantageously, the associated geographical co-ordinates comprise the longitude and latitude or x,y co-ordinates of at least one geographical location in a subject zone.
Preferably, each zone is a polygonal area.
Another aspect of the invention provides a database incorporating an index embodying the invention.
A further aspect of the present invention provides a method of indexing data relating to geographical locations comprising: providing a plurality of layers of geographical information concerning a geographical area, each layer comprising a division of the geographical area into a plurality of discrete zones, each of which has a zone identifier and associated geographical co-ordinates each of which co-ordinates defines a geographical location; and associating for each layer a geographical location with those zones containing the geographical location such that a geographical hierarchy is provided for each geographical location.
Another aspect provides a method of utilizing a geographical location index comprising a plurality of layers of geographical information concerning a geographical area, each layer comprising a division of the geographical area into a plurality of discrete zones which each have a zone identifier and associated geographical co-ordinates of one or more geographical locations contained within that zone, the method comprising searching the index for a target geographical location and determining therefrom one or more nearby geographical locations or data concerning the nearby geographical locations.
In order that the present invention may be more readily understood, embodiments thereof will now be described, by way of example, with reference to the accompanying drawings, in which:
a-d show a set of tables containing information relating to
Referring to
Each zone is described as a polygon with an associated name (such as a country name in the case of
It is possible to form a finite list of the zones in a geographical area by listing all zones by their identification (e.g. in the case of
Referring to
Additionally, each zone contains a finite number of zones in the layer below. Thus, if the country is the USA, then that country contains all the states of the USA. Each of the zones “contained” in the top layer zones is listed under the respective top layer zone in which they are located. The “contained” zones can be regarded as “child” zones. For the top layer, whilst having “child zones”, there are no zones which “contain” the top zone—“parent” zones. However, lower zones will have “child” zones and one or more “parent” zones. Conversely, the bottom layer will not have any “child” zones but will only have one or more “parent” zones.
Thus, the index for a particular layer comprises a record of the subject zone identifiers in the layer, the zone identifiers of zones neighboring each subject zone, the or each zone identifier of a “parent” zone and the or each zone identifier of all “child” zones.
By combining this information, the index is compiled to produce a list of all geographical areas, their adjoining neighbors, “parent” areas and contained “child” areas. This information is represented in the combination of the tables for each of the layers in
Utilizing this system allows a query for data concerning a given geographical point to be analyzed and to return data ordered by its proximity to the given geographical point—in the first instance data would be returned for the subject zone and then data concerning neighboring zones in the same layer.
The process is repeated for each layer in a geographical system where there are multiple layers of information. So “countries” are processed separately from “states” which are processed separately from “counties” which are processed separately from “postcodes”. As each layer of information is processed separately, it is of no importance if the edge polygons in one layer share edges with another level. (Postcodes can typically cross borders of counties in some countries such as in the UK).
Given a point (x,y) or (longitude, latitude) and given a set of named polygons describing the geographical structure, it is possible to determine to which named geographical identifier a point belongs by checking which polygons it falls within, and hence derive all the neighboring polygons as well.
This invention can be viewed, but not limited to, as a method of increasing relevancy for search engines when doing local searching, by allowing improvements in ranking either based on locality names or geographical hierarchical information.
In one example, the invention is used to index pages from the World Wide Web. A World Wide Web page which mentions “Eiffel Tower” but not “Paris” or “France” can, through this invention, still be indexed using the keywords “Paris” and “France” since the hierarchical structure inherent in the invention contains this information—France and Paris being the parent zones of“Eiffel Tower”.
A world wide web search engine can also use this information to improve the internal page ranking for pages which are known to consistently use the hierarchical information. Thus a page which mentions corresponding locations in different layers (example “Eiffel Tower” and “Paris”) can be given improved ranking for correct use of both terms. Using the hierarchical information in this matter can prove beneficial in countering the practice of “web spamming” where authors of commercial WebPages attempt to gain higher search engine ranking by including long lists of location names.
For example, a web page author will, of course, wish for as many people to visit the web page as possible, so as to increase the number of potential customers for the products and/or services advertised thereon. Should an Internet user wish to locate a web page with particular information, it is common to use the services of an Internet search engine. The user inserts a search term and the search engine then scans the available pages on the Internet to find pages containing the search term, then returns details of these pages to the user. There are, of course, certain search terms that are used very often in Internet searching, for example: “News” or “MP3”. The authors of some web pages—which web pages are not necessarily related to “News” or “MP3”—may wish to improve the likelihood of the web page being returned in a search and will include a list of these common search terms on the web page. Commonly, such web pages will “hide” these terms by using a white colored font on a white background, so that the user is unaware of their existence on the web page.
The invention also, therefore encompasses an analysis whereby the content of a body of information such as a web page or the like is reviewed to determine whether the locations named on the page fall consistently into neighboring zones, child zones or parent zones. If the determination indicates consistent inclusion, then the page ranking can be approved or possibly improved. If the determination indicates that the page contains random locations not linked to a coherent set (predetermined by the analysis criteria) of zones, then the page ranking can be lowered.
Indexing and search of data can now follow any of the following methods:
Method 1:
Method 2:
Coding all spatial data to an identifier means that the spatial search is now reduced to traditional keyed indexing technology. This would be carried out using the following method.
Method 3:
In order to satisfy a query for specific data in the vicinity of a target geographical location, the following steps would apply.
Still further, to aid efficient data retrieval, the index can store other information in relation to a set of given zones. For example, as well as recording a list of neighbors, parents and children of any given zone, other information, such as the time it takes to travel between zones can be recorded. For example, a user could use an index embodying the present invention to query all of the churches within a two hour drive of a target geographical location (e.g. their home).
Still further, to aid efficient data retrieval, the index can store other information in relation to a set of given zones. For example, as well as recording a list of neighbors, parents and children of any given zone, other information, such as the time it takes to travel between zones can be recorded. For example, a user could use an index embodying the present invention to query all of the churches within a two hour drive of a target geographical location (e.g. their home).
Number | Date | Country | Kind |
---|---|---|---|
0415072.8 | Jul 2004 | GB | national |