Database systems, including Relational Database Management Systems (“RDBMS”) are sometimes required to accommodate geospatial data types that can store such spatial features as points, linestrings, polygons, multipoints, multilinestrings, multipolygons, and geometry collections. Such spatial features can be used to represent a wide variety of useful information, such as geographical elements on maps, network elements on a chart of a company's communications network, or elements on a company's organizational chart. Some of these spatial features, such as a polygon, which is planar area bounded by a closed path or circuit composed of a finite sequence of straight line segments meeting at vertices, and a multipolygon, which is a set of two or more polygons, can contain a large number of vertices and they can contain one or more interior rings, which are open areas bounded by a closed path or circuit.
Representing a spatial feature containing one or more interior rings within a database is a challenge.
In general, in one aspect, the invention features a method. The method includes considering a spatial feature for storage in a database system running on a computer. The spatial feature includes a polygon, P. The polygon includes an outer ring, OR, and the area within the outer ring. The outer ring includes a plurality of vertices, ORV1 . . . ORVM, with the first and last vertex being the same (i.e., ORV1=ORVM, etc.), lines connecting consecutive vertices (i.e., ORV1 and ORV2, ORV2 and ORV3). The polygon includes an interior ring contained within the polygon. The computer removes the interior ring from the polygon. The computer determines a line along which to split the polygon without regard for the location of the inner ring within the polygon. The computer splits the polygon into two polygons, SP1 and SP2, along the line. The computer applies the interior ring to the two split polygons by invoking a point set difference between the interior ring and the two split polygons. The computer stores the split polygons in the database system on the computer.
Implementations of the invention may include one or more of the following. Element a) may include storing the interior ring in a vector on the computer. Element b) may include performing a triangulation technique selected from the group of triangulation techniques consisting of selecting the location of the line so that the number of outer ring vertices in each of the polygons is substantially equal and selecting the location of the line so that the area in each of the polygons is substantially equal. Invoking the point set difference in element d) may include performing one of the following actions: d1) determining that the interior ring does not intersect with SP2, and d1) applying the entire interior ring to SP1. Invoking the point set difference in element d) may include performing one of the following actions d1) determining that the interior ring intersects with both SP1 and SP2, and d2) subtracting the portion of the interior ring that intersects with SP1 from SP1 and the portion of the interior ring that intersects with SP2 from SP2. After a first iteration of elements a) through d), and prior to performing element e) the method may determine that one of SP1 and SP2 are too large and repeat elements a) through d) on SP1 and SP2, to produce SP11, SP12, SP21, and SP22, wherein element e) comprises determining that split polygons SP11, SP12, SP21, and SP22 are not too large; and storing the split polygons SP11, SP12, SP21, and SP22 in the database system on the computer. The spatial feature may include a plurality of polygons. A subset of the plurality of polygons may include interior rings. The method may include performing elements a) through e) for each polygon that contains an interior ring.
In general, in another aspect, the invention features a computer program stored in a tangible medium. The program includes executable instructions that cause a computer to consider a spatial feature for storage in a database system running on a computer. The spatial feature includes a polygon, P. The polygon includes an outer ring, OR, and the area within the outer ring. The outer ring including a plurality of vertices, ORV1 . . . ORVM, with the first and last vertex being the same (i.e., ORV1=ORVM, etc.) and lines connecting consecutive vertices (i.e., ORV1 and ORV2, ORV2 and ORV3). The spatial feature further includes an interior ring contained within the polygon. The program includes executable instructions that cause the computer to a) remove the interior ring from the polygon. The program includes executable instructions that cause the computer to b) determine a line along which to split the polygon without regard for the location of the inner ring within the polygon. The program includes executable instructions that cause the computer to c) split the polygon into two polygons, SP1 and SP2, along the line. The program includes executable instructions that cause the computer to d) apply the interior ring to the two split polygons by invoking a point set difference between the interior ring and the two split polygons. The program includes executable instructions that cause the computer to e) store the split polygons in the database system on the computer.
In general, in another aspect, the invention features a database system. The database system includes one or more nodes. The database system includes a plurality of CPUs, each of the one or more nodes providing access to one or more CPUs. The database system includes a plurality of virtual processes, each of the one or more CPUs providing access to one or more virtual processes. Each virtual process is configured to manage data, including rows from the set of database table rows, stored in one of a plurality of data-storage facilities. The database system includes a process that considers a spatial feature for storage in the database system. The spatial feature includes a polygon, P. The polygon includes an outer ring, OR, and the area within the outer ring. The outer ring includes a plurality of vertices, ORV1 . . . ORVM, with the first and last vertex being the same (i.e., ORV1=ORVM, etc.) and lines connecting consecutive vertices (i.e., ORV1 and ORV2, ORV2 and ORV3). The polygon includes an interior ring contained within the polygon. The process includes a) removing the interior ring from the polygon. The process includes b) determining a line along which to split the polygon without regard for the location of the inner ring within the polygon. The process includes c) splitting the polygon into two polygons, SP1 and SP2, along the line. The process includes d) applying the interior ring to the two split polygons by invoking a point set difference between the interior ring and the two split polygons. The process includes e) storing the split polygons in the database system on the computer.
The technique disclosed herein has particular application, but is not limited, to large databases that might contain many millions or billions of records managed by a database system (“DBS”) 100, such as a Teradata Active Data Warehousing System available from the assignee hereof.
For the case in which one or more virtual processors are running on a single physical processor, the single physical processor swaps between the set of N virtual processors.
For the case in which N virtual processors are running on an M-processor node, the node's operating system schedules the N virtual processors to run on its set of M physical processors. If there are 4 virtual processors and 4 physical processors, then typically each virtual processor would run on its own physical processor. If there are 8 virtual processors and 4 physical processors, the operating system would schedule the 8 virtual processors against the 4 physical processors, in which case swapping of the virtual processors would occur.
Each of the processing modules 1101 . . . N manages a portion of a database that is stored in a corresponding one of the data-storage facilities 1201 . . . N. Each of the data-storage facilities 1201 . . . N includes one or more disk drives. The DBS may include multiple nodes 1052 . . . N in addition to the illustrated node 1051, connected by extending the network 115.
The system stores data in one or more tables in the data-storage facilities 1201 . . . N. The rows 1251 . . . Z of the tables are stored across multiple data-storage facilities 1201 . . . N to ensure that the system workload is distributed evenly across the processing modules 1101 . . . N. A parsing engine 130 organizes the storage of data and the distribution of table rows 1251 . . . Z among the processing modules 1101 . . . N. The parsing engine 130 also coordinates the retrieval of data from the data-storage facilities 1201 . . . N in response to queries received from a user at a mainframe 135 or a client computer 140. The DBS 100 usually receives queries and commands to build tables in a standard format, such as SQL.
In one implementation, the rows 1251 . . . Z are distributed across the data-storage facilities 1201 . . . N by the parsing engine 130 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index is called the hash function. Some portion, possibly the entirety, of the hash value is designated a “hash bucket”. The hash buckets are assigned to data-storage facilities 1201 . . . N and associated processing modules 1101 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
In addition to the physical division of storage among the storage facilities illustrated in
In one example system, the parsing engine 130 is made up of three components: a session control 200, a parser 205, and a dispatcher 210, as shown in
Once the session control 200 allows a session to begin, a user may submit a SQL query, which is routed to the parser 205. As illustrated in
An example polygon 405, shown in
An example multipolygon 505 (note that the dashed line represents a grouping and possibly, although not necessarily, a bounding area), shown in
A polygon with inner rings, illustrated in
While the example shown in
In one embodiment, a polygon with interior rings is divided for storage. A polygon without interior rings can be iteratively divided using a technique such as triangulation, in which the polygon is divided into a set of triangles. One known technique for triangulating is to add lines from one vertex in the polygon to all other vertexes. In one embodiment, the location of the line is determined without consideration of the location of the interior ring or rings within the polygon. That is, in one embodiment, no attempt is made to have the line intersect the interior ring or rings.
One embodiment of an iterative technique for dividing a polygon with interior rings is described below:
In one embodiment, a line 710 is defined through the polygon to divide the polygon into two polygons. In one embodiment, the line 710 is located so as to divide the polygon into two polygons of equal area or substantially equal area (i.e., areas within 10 percent of each other). In one embodiment, the line 710 is located so as to divide the polygon into two polygons having the same, or substantially the same, number of vertices. In one embodiment, the polygon 620 is then divided along the line 710 to form two split polygons 715, 720.
In one embodiment, the interior rings 610, 615, 620, and 625 are then applied to the two split polygons 715, 720, as indicated by the broad arrow in
In one embodiment, the polygons are then checked to see if they are an appropriate size to be stored in the database. In this case, for the sake of further explanation, assume that neither split polygon 715, 720 is an appropriate size. In that case, in one embodiment, the process continues by further splitting the split polygons. In one embodiment, it is possible that one of the split polygons is an appropriate size and that the process of further splitting need only be applied to the split polygon that is not of an appropriate size. In one embodiment, the split polygon that is the appropriate size is split and the process iterates until all of the split polygons are of an appropriate size for storage in the database.
Continuing the example, as shown in
In one embodiment, a line 915 is defined through split polygon 715, using a technique such as one of the triangulation techniques described above. Similarly, in one embodiment, a line 920 is defined through split polygon 720, using a technique such as one of the triangulation techniques described above. In one embodiment, the split polygons 715 and 720 are then split along lines 915 and 920, respectively, to form 4 new split polygons 925, 930, 935, and 940.
In one embodiment, the interior rings 620, 625 are then applied to split polygons 925 and 930 and the interior rings 610 and 615 are applied to split polygons 935 and 940, as indicated by the broad arrows in
In one embodiment, the technique next determines whether split polygons 925, 930, 935, and 940 are of an appropriate size to store in the database. Assume in this case that they are determined to be of an appropriate size. In that case, in one embodiment, the split polygons are stored.
In one embodiment, this technique can be used in at least two different contexts.
In one embodiment, the table function specification would be
In one embodiment, the rows that are returned are placed in a table and partitioned or hashed on the SliceNumber so that each of the slices for each geometry would be distributed across a parallel database. In one embodiment, the geom_id that is returned is the geom._id that is passed into the table function for that geometry. In one embodiment, each slice in the geometry would have the same geom_id so that each slice can be associated with its original geometry.
In use, in one embodiment shown in
In one embodiment, the technique then considers the next polygon (block 1110). If the spatial feature is a polygon, in one embodiment, this block selects the polygon. If the spatial feature is a multipolygon, in one embodiment, this block selects the next polygon. In the first iteration, in one embodiment, the next polygon is the first polygon.
In one embodiment, the technique next determines if the selected polygon is too large to store in the database (block 1115). If it is not, in one embodiment the technique checks to see if there are more polygons to consider (block 1120). If there are not, in one embodiment the split polygon (or the original polygon if it was not too large) is stored (block 1125). If there are more polygons to consider, in one embodiment the technique returns to block 1110 and selects the next polygon.
If the polygon is too large (block 1115), in one embodiment the interior rings are removed (block 1130) and stored in a vector or array. In one embodiment, a line is then determined to split the polygon (block 1135) using one of the techniques described above. In one embodiment, the polygon is then split along the line (block 1140). In one embodiment, the interior rings are then applied to the split polygons (block 1145). In one embodiment, the technique then returns to block 1115 to determine if the resulting split polygons are too large.
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.
Number | Name | Date | Kind |
---|---|---|---|
5986674 | Yamato | Nov 1999 | A |
6895344 | Ramaswamy | May 2005 | B2 |
6917877 | Yang | Jul 2005 | B2 |
7979479 | Staebler et al. | Jul 2011 | B2 |
20040215641 | Kothuri et al. | Oct 2004 | A1 |
20050203932 | Kothuri et al. | Sep 2005 | A1 |
20090091568 | Ravada et al. | Apr 2009 | A1 |
20090147013 | Stenson et al. | Jun 2009 | A1 |