ENHANCED SPATIAL INDEX FOR POINT IN POLYGON OPERATIONS

Information

  • Patent Application
  • 20170303106
  • Publication Number
    20170303106
  • Date Filed
    April 15, 2016
    8 years ago
  • Date Published
    October 19, 2017
    7 years ago
Abstract
An improved method is provided for determining whether a sample point is within a defined geographic area. Indexes for the geographic area of interest are generated in advance. Such indexes complement the traditional spatial indexing techniques such as quad tree and r-tree. The geographic area, as defined by an outer boundary, is subdivided into some regular geometric shape, preferably a rectangle, encoded into a suitable form, and indexed. Then, a simplified comparison of the sample point to the indexed regular shapes is made.
Description
TECHNICAL FIELD

The present invention relates to a pre-generated geographic index to facilitate analysis of location data.


BACKGROUND OF THE INVENTION

In many location based services applications, it is important to know whether a particular geographic point is inside a particular geographic region. For example, for 9-1-1 emergency services, it is important to assign a context for en event triggered from a mobile device, and to notify the authorities in the appropriate jurisdiction. Depending on which county the mobile device user is in, a different emergency response team may be assigned. This is commonly referred to as “point in poly operations.” There are many other similar use cases where dynamically generated points get compared in real time against static (or slowly changing) polygons for geometric interaction. Another common application is geofencing, in which it is determined when a mobile device user enters into a defined region, and using that information to send them targeted messages, or for tracking.


Conventionally, accurate point in poly operations are very processor intensive. Geographic boundaries are often irregular, and accurate results can require a full geometric analysis. Processing power and processing speed can be critical when large numbers of points are being processed, and when timely decisions need to be made based on the output.


SUMMARY OF THE INVENTION

An improved method is provided for determining whether a sample point is within a defined geographic area. This technique improves processing speed for responding to “point in polygon” operation requests for a large majority of locations. In this method, indexes for the geographic area of interest are generated in advance. Such index can complement a regular quad-tree or r-tree index. The geographic area, as defined by an outer boundary, is then subdivided into some regular geometric shape and indexed. Then, a simplified comparison of the sample point to the indexed regular shapes is made.


Creation of Index:


Using the rectangle implementation for this description, a first step for indexing is to define a rectangular boundary box that fully encloses the geographic area. As is conventionally known, this bounding box can be checked against a sample point. But since most geographic areas are irregular in shape, there are many areas inside the bounding box that are not inside the geographic area of interest. Thus, the initial bounding box can only be used to determine with certainty whether the sample point is outside of the geographic area.


In a next step, he boundary box is divided into a plurality of equal first level rectangular boxes at a first dividing level. Each of the first level boxes is then examined to determine whether the first level box is (a) entirely outside the geographic area; (b) entirely inside the geographic area: (c) intersecting with the defined boundary. The result of this determination will determine which boxes can be indexed, and which require further processing.


If the first level box is entirely outside the geographic area, then that box is discarded and it is not part of the indices. This step is to identify and exclude larger areas that fall into that gap between the geographic boundary and the initial bounding box.


If the first level box is entirely inside the geographic area, then that box is added to the indices and is assigned an index number that is stored with the corresponding range of coordinates for that box. This step identifies large blocks that are entirely contained within the geographic areas.


If the first level box intersects with the defined boundary then that box requires further processing, since it includes both desirable area from inside the boundary, and unwanted area that is outside. This further processing includes dividing that box into a plurality of equal second level rectangular boxes at the second dividing level.


The same steps are applied again for the second level boxes that have been created e.g. determining whether they are fully outside, fully inside, or intersecting and the same indexing is performed. In the preferred embodiment, the examination, determination and indexing steps are performed iteratively for a subsequent number of dividing levels. The iterations may continue until (1) there are no more boxes that intersect with the defined boundary, or (2) a maximum dividing level has been reached.


Searching Against an Index:


In a preferred embodiment, the search point, or sample point, to be analyzed can be checked at a high level prior to comparison to the index. That is, prior to the step of comparing the sample point to the index data, it is determined whether the sample point is inside or outside the initial boundary box. Then, if the sample point is outside the boundary box, an output indicates that the sample point is outside of the geographic area. If the sample point is inside the boundary box, then subsequent steps for comparison to the index are followed.


For analysis with the index, the sample point is compared to the coordinate ranges of the indexed boxes to determine if the sample point is included in the indices. If the sample point is within the indexed boxes then a response is provided indicating that the sample point is within the geographic area. If the sample point is not within the indexed boxes then a full geometric analysis is performed of the sample point in comparison to a full geometry of the defined boundary.


In another embodiment, multiple indices are generated for multiple layers of maps having different boundaries for a same physical location. Then, the step of comparing the sample point to the coordinate ranges of the indexed boxes is done concurrently for the multiple indices. An exemplary application of the multiple indices processing is for when the multiple layers represent different types of insurance risks. Then, the method determines whether the sample point is in located in designated risk zones (for example, flood, fire, earthquake) in each of the respective layers.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts a single boundary box around a geographic region of interest.



FIG. 2 depicts subdivisions used for indexing,



FIG. 3 depicts a simplified example of a first level division of a bouding box around a simplified geographic region.



FIG. 4 depicts a simplified example of a second level division of the first level boxes of FIG. 3.



FIG. 5 depicts a simplified example of a third level division of the second level boxes of FIG. 4.



FIG. 6 depicts an index numbering scheme that can be applied to index boxes.





DETAILED DESCRIPTION

Traditionally, data structures such as R-trees and quad-trees can be used for speeding up such point in polygon searches. In the classical R-tree implementation, the primary spatial filter is performed using the bounding box of the geometry; and returns either NO (when the bounding box of the searched object does not overlap the bounding box of a record in the table) or MAYBE (when the bounding boxes overlap). In the latter case, this means that we need to actually compare the individual geometries for intersection. A quad tree index has the same characteristics, except that it uses tiles for the exterior approximation, rather than bounding boxes.


In FIG. 1, an R-tree search can show us the bounding box 11 around the Albany region 12. Point A can be eliminated as a candidate for overlapping the Albany county region 12 quickly. However, an estimation using bounding box 11 cannot distinguish between points B (just outside the border) and C (inside the border).


The current method adds an interior approximation of the polygon to speed up the searches. The method adapted for interior approximation is a novel use of quad tree tiles. The quad tree tiles are used as an inexpensive approximation of the interior of the polygon. This complements and improves the currently used exterior approximation provided by simple shapes, such as bounding boxes (in r-trees) or quad tiles (for quad tree).


For each polygon that we need to index, we store the ‘interior’ quad tree tiles that are entirely contained in the polygon in addition to the traditional R-tree or quad tree index. Any point within the interior quad tree tiles can be quickly determined to be in without constructing the topology of the polygon. The inscribed rectangles are persisted either using the usual quad tree code (a string formed by the integer encoding using quadrants 0, 1, 3) or using an integer encoding scheme and indexed using traditional B-tree or hash indices.


As seen in FIG. 2, interior quad tree tiles (boxes) of different sizes 13, 14, 15, and 16 have been generated to approximately fill in the interior of the Albany region 12, that is of interest. Each of the boxes 13, 14, 15, and 16 of various sizes is recorded in an index that includes their respective ranges of x,y coordinates.


To determine if a point overlaps a polygon, we see if the point overlaps any of the inscribed rectangles first. If it does, you've answered the question by purely using the indices, without decoding the polygon and constructing the topology. If not, you can fall back on prior techniques, such as ray tracing. Thus it can be seen that this technique avoids using high intensive full geometric analysis for the vast majority of the region, but does not help for the smaller region that is uncovered by the tiles, but is within the search polygon.



FIGS. 3, 4, and 5 depict an extremely simplified example of steps for forming interior boxes for purposes of indexing in accordance with the improved method herein. In addition to this explanation, representative pseudocode for implementing the preferred embodiment is provided at the end of this specification.


In the example explained with the figures, the geographic region 20 happens to be a circle. A bounding box 11 is positioned to completely enclose the region 20. Preferrably, bounding box 11 is the smallest possible rectangle that can contain the geographic region 20. The bounding box 11 can be defined by its MINX, MINY coordinates at the lower left, and the MIDX, MIDY coordinates in the middle of the box 11.


A first step for indexing is to divide the initial box 11 into equal sized smaller boxes. In this example, the box 11 is dived into four equal quadrants, numbered 0, 1, 2, and 3. Those four quadrants are now a “first level” of boxes that are the result of a “first level” division. A further step is to determine whether any of the new first level boxes can be excluded from the index, or indexed. Boxes are excluded from the index when no part of the box is within the geographic region 20. Boxes are added to the index when all of the box is contained within the geographic region 20. Neither of these cases apply in FIG. 3, so it is time for the next step.


In FIG. 4, each of the first level quadrants have been further divided into four equal subdivisions. These new “second level” boxes are labeled within their original boxes as 0′, 1′, 2′, and 3′. At this second level of dividing, we see that four of the second level boxes are completely contained within the geographic region 20. These are the shaded boxes 21, and those boxes are each given a unique indexing number and they are recorded with their corresponding x, y coordinates defining those boxes. The remaining unshaded boxes are still include intersect with the geographic boundary 20, so those boxes are going to be further subdivided.



FIG. 5 shows an exemplary portion of the third level subdivision for the interior box structure that was developed in FIGS. 3 and 4. The third level boxes are labelled 0″, 1″, 2″, and 3″. At this third level of dividing we can see that more third level boxes (marked 22) can be added to the index because they are completely within the boundary 20. Also, at this third level, the corner boxes 23 are completely outside the boundary. Boxes 23 can therefore be ignored going forward and they are not made part of the index.



FIG. 6 shows a preferred numbering scheme for identifying boxes generated in a process like that described above. The box set 60 includes only first level boxes, so the numbers only include two digits. As boxes with greater levels of resolution are added, additional digits are added to indicate their level.


In the preferred embodiment, the “0 digit indicates the upper left quadrant box for the level that the digit represpents. “1” is the upper right: “2” is the lower left; and “3” is the tower right quadrant.


In box set 61, the upper right box (index number 11) has been subdivided to the second level. Thus a new digit has been added at the end after “11, ” and that new digit is indicative of which quadrant it is in. The same process has been followed in box set 62 where box 111 has been divided into boxes 1110, 1111, 1112, and 1113.


Note that the use of Quad tree tiles implies that the operations are on a Cartesian plane. However, the core idea is the use of a hierarchical spatial tessellation scheme, whose key can be indexed using traditional scalar indexing techniques. Hence the same idea can be used to index the spherical model of the world (i.e. points and lines expressed in longitude, latitude systems) using an existing tessellation model, such as the hierarchical triangular mesh (HTM), an indexing method proposed by skyserver.org. The only requirements from the index on sphere is that it is hierarchical in nature, and has an inexpensive way to translate a point to a tile an encoded area at the lowest level of tessellation under the hierarchical scheme.


A nice property of this algorithm is that we can a point to search against multiple polygon tables at the same time. For example, if a point representing an insured property needs to be checked against fire risk, flood risk, and earthquake risk boundaries so we can assign an insurance risk score, we could check the point against a tile index that is a union of all tiles from the four individual tables. We will fall back on a table polygon check only in cases where the search does not yield a match for a particular layer.


Although the invention has been described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and various other changes, omissions and deviations in the form and detail thereof may be made without departing from the spirit and scope of this invention.


Pseudocode

The following is an exemplary pseudocode algorithm for implementing the improved indexing method. This is recursive a recursive algorithm that works by dividing the area of interest into four quadrants at each step of the way, as discussed previously. The starting arguments for the recursive procedure are; the polygon to be indexed, an empty string, and the bounding box covering the whole extent of the data.














Pseudocode: Recursive Algorithm GetQuadTilesForPolygon


Constants


  MAXLEVEL: Maximum number of levels that we are willing to go into


Inputs:


  G: Geometry representing the region


  VisitedQuads: String representing the hierarchy of the quad tiles visited so far in the depth first traversal


  MinX, MinY, MaxX, MaxY: bounding box of the area being examined by this invocation


Returns QuadKeyList;


START Procedure


  If length(VisitedQuads) >= MAXLEVEL


   // no need to split further


   Return;


  // Compute Midpoint of current level


MidX = (MinX+MaxX)/2;


  MidY = (MinY+MaxY)/2;


 // termination condition for recursion


// if current box is entirely outside the polygon being indexed, we are done


  If (Intersects(G, Box(Minx, MinY, MaxX, MaXY)) = false


   Return;


 // termination condition for recursion


// if the box entirely within geometry, add the vistedQuad string to the output list


  If (contains(G, Box(Minx, MinY, MaxX, MaXY)) = true


   Add(QuadKeyList, VisitedQuads);


   Return;


  // otherwise split current box down to the next level







embedded image







  // recurse into top left (quadrant 0)


  if (Intersects(G, box(MinX, MidY, MidX, MaxY)) = true)


   NewQuad = VisitedQuads + ‘0’;


   invoke CollectQuadTiles G, NewQuad, MinX, MidY, MidX, MaxY;


  // recurse into top right (quadrant 1)


  if (Intersects(G, Box(MidX, MidY, MaxX, MaxY)) = true)


   NewQuad = VisitedQuads + ‘1’;


   Invoke CollectQuadTiles G, NewQuad, MidX, MidY, MaxX, MaxY;


  // recurse into bottom left (quadrant 2)


  if (Intersects(G, Box(MinX, MinY, MidX, MidY)) = true)


NewQuad = VisitedQuads + ‘2’;


   invoke CollectQuadTiles G, NewQuad, MinX, MinY, MidX, MidY;


  // recurse into bottom right (quadrant 3)


  if (Intersects(G, Box(MidX, MinY, MaxX, MidY)) = true)


   NewQuad = VisitedQuads + ‘3’;


   invoke CollectQuadTiles G, NewQuad, MidX, @MinY, MaxX, midy;


END Procedure









The above algorithm is invoked with the following starting values:

  • G=the polygon and multi-polygon that need to be indexed
  • visitedQuads=an empty string.
  • MinX, MinY, MaxX, MaxY: the bounding box of the coordinate system


The following pseudocode shows an exemplary algorithm for determining point overlap














Psuedo code: Algorithm; GetQuadKeyForPoint


Inputs


 x, y: floating point numbers representing ordinates of the 2D point to be encoded


Output


 When the algorithm completes, the result will be in string s.


Set s = “”


Set minx = COORDSYS_MINX


Set miny = CORRDSYS_MINY


Set maxx= COORDSYS_MAXX


Set maxy= COORDSYS_MAXY


Set POINT_INDEX_LEVEL to one level higher than the MAXLEVEL used for polygon tile construction


If (x is between minx & maxx) AND (y is between miny & maxy)


 Iterate for (int i = 0; i < POINT_INDEX_LEVEL; i++)


 BEGIN


  set midx = (minx+maxx)/2;


 set midy = (miny+maxy)/2;





  embedded image





  if (x < midx AND x > minx AND y > midy AND y < maxy )


   // quadrant 0


   Set s = s + “0”;


   Set maxx = midx;


   Set miny = midy;


  else if (x > midx AND x < maxx AND x > midy AND x < maxy )


   // quadrant 1


   Set s = s + “1”;


   Set minx = midx;


   Set miny = midy;


  else if (x > minx AND x < midx AND y > miny AND y < midy )


   // quadrant 2


   Set s = s + “2”;


   Set maxx = midx;


   Set maxy = midy;


  else


   // quadrant 3


   Set s = s + “3”;


   Set minx = midx;


   Set maxy = midy;


 END









The result now in string s;


To determine if a point overlaps any of the polygons that are being searched, this algorithm is invoked to assign a quad key to the point at MAXLEVEL+1. Then you can search the polygons whose quad keys are prefixes to the point's quad key.

Claims
  • 1. A computer implemented method for determining whether a sample point is within a geographic area by generating indices for subdivided areas within the geographic area, and the geographic area having a defined boundary, the method comprising; defining a rectangular boundary box that encloses the geographic area:dividing the boundary box into a plurality of equal first level rectangular boxes at a first dividing level;determining for each of the first level boxes whether the first level box is (a) entirely outside the geographic area; (b) entirely inside the geographic area or (c) intersecting with the defined boundary;for each of the first level boxes performing the following indexing steps: if the first level box is (a) entirely outside the geographic area, then that box is discarded and it is not part of the indices;if the first level box is (b) entirely inside the geographic area, then that box is added to the indices and is assigned an index number that is stored with the corresponding range of coordinates for that box;if the first level box is (c) intersecting with the defined boundary then dividing that box into a plurality of equal second level rectangular boxes at the second dividing level;repeating the determining and indexing steps for each of the second level boxes as described above for the first level boxes.
  • 2. The method of claim 1 wherein the step of repeating further comprises: iteratively repeating the determining and indexing steps for a subsequent number of dividing levels until (1) there are no more boxes that intersect with the defined boundary, or (2) a maximum dividing level has been reached.
  • 3. The method of claim 2 wherein the indexing step includes including the level of an indexed box as part of the index number.
  • 4. The method of claim 1 further including the steps of: receiving the sample point for which a response is required as to whether the sample point is located within the geographic area;comparing the sample point to the coordinate ranges of the indexed boxes to determine if the sample point is included in the indices;if the sample point is within the indexed boxes then providing a response indicating that the sample point is within the geographic area; andif the sample point is not within the indexed boxes then performing a full geometric analysis of the sample point in comparison to a geometry of the defined boundary.
  • 5. The method of claim 4 further including, prior to the step of comparing the sample point to the coordinate ranges of the indexed boxes, determining whether the sample point is inside or outside the boundary box; and if the sample point is outside the boundary box, then providing an output indicating that the sample point is outside of the geographic area;if the sample point is inside the boundary box, then proceeding with the comparing step and subsequent steps thereafter.
  • 6. The method of claim 4 wherein conventional analysis includes comparing the sample point to a complete geometry of the defined boundary.
  • 7. The method of claim 4 wherein multiple indices are generated for multiple layers of maps having different boundaries for a same physical location, and wherein the step of comparing the sample point to the coordinate ranges of the indexed boxes is done concurrently for the multiple indices.
  • 8. The method of claim 7 wherein the multiple layers represent different types of insurance risks, and whereby the method determines whether the sample point is in located in designated risk zones in each of the respective layers.
  • 9. A computer implemented method for determining whether a sample point is within a geographic area by generating indices for subdivided areas within the geographic area, and the geographic area having a defined boundary, the method comprising: defining a geometric boundary shape that encloses the geographic area;dividing the boundary shape into a plurality of equal first level shapes at a first dividing level;determining for each of the first level shapes whether the first level shape is (a) entirely outside the geographic area; (b) entirely inside the geographic area; or (c) intersecting with the defined boundary:for each of the first level shapes performing the following indexing steps: if the first level shape is (a) entirely outside the geographic area, then that shape is discarded and it is not part of the indices;if the first level shape is (b) entirely inside the geographic area, then that shape is added to the indices and is assigned an index number that is stored with the corresponding range of coordinates for that shape;if the first level shape is (c) intersecting with the defined boundary then dividing that shape into a plurality of equal second level shapes at the second dividing level;repeating the determining and indexing steps for each of the second level shapes as described above for the first level shapes.
  • 10. The method of claim 9 wherein the step of repeating further comprises: iteratively repeating the determining and indexing steps for a subsequent number of dividing levels until (1) there are no more shapes that intersect with the defined boundary, or (2) a maximum dividing level has been reached.
  • 11. The method of claim 10 wherein the indexing step includes including the level of an indexed shape as part of the index number.
  • 12. The method of claim 10 further including the steps of: receiving the sample point for which a response is required as to whether the sample point is located within the geographic area;comparing the sample point to the coordinate ranges of the indexed shapes to determine if the sample point is included in the indices;if the sample point is within the indexed shapes then providing a response indicating that the sample point is within the geographic area; andif the sample point is not within the indexed shapes then performing a full geometric analysis of the sample point in comparison to a geometry of the defined boundary.
  • 13. The method of claim 12 further including, prior to the step of comparing the sample point to the coordinate ranges of the indexed shapes, determining, whether the sample point is inside or outside the boundary shape; and if the sample point is outside the boundary shape, then providing an output indicating that the sample point is outside of the geographic area;if the sample point is inside the boundary shape, then proceeding with the comparing step and subsequent steps thereafter.
  • 14. The method of claim 12 wherein conventional analysis includes comparing the sample point to a complete geometry of the defined boundary.
  • 15. The method of claim 12 wherein multiple indices are generated for multiple layers of maps having different boundaries for a same physical location, and wherein the step of comparing the sample point to the coordinate ranges of the indexed shapes is done concurrently for the multiple indices.
  • 16. The method of claim 15 wherein the multiple layers represent different types of insurance risks, and whereby the method determines whether the sample point is in located in designated risk zones in each of the respective layers.