The present application incorporates by reference the entire contents of U.S. application Ser. No. 13/780,990, titled “Method For Query Processing Of Topological Relationships Among Complex Spatial Objects” and filed on Feb. 28, 2013.
The present application incorporates by reference the entire contents of U.S. application Ser. No. 12/871,824, titled “Query And Exadata Support For Hybrid Columnar Compressed Data”, and filed on Aug. 30, 2010.
The present application incorporates by reference the entire contents of U.S. application Ser. No. 13/840,811, titled “Method To Accelerate Queries Using Dynamically Generated Alternate Data Formats In Flash Cache”, and filed on Mar. 15, 2013.
The present application incorporates by reference the entire contents of U.S. application Ser. No. 13/797,063, titled “OLTP Compression Of Wide Tables”, and filed on Mar. 12, 2013.
The present application incorporates by reference the entire contents of U.S. application Ser. No. 13/631,434, titled “Predicate Result Cache”, and filed on Sep. 28, 2012, the contents of which are incorporated herein by reference.
The present invention relates generally to handling point spatial data in a database system and more particularly to storing and retrieving point spatial data in a database system without using spatial indexes.
Light Detection and Ranging or Laser Imaging Detection and Ranging (LIDAR) is an optical remote sensing technology that is used to gather information regarding targeted surfaces. The technology generates a large number of data points, in some cases billions of data points, over the targeted surface where the data points include information such as the range or distance from the laser to the target, orientation, scan angle, calibration data, and often other attributes relating to the surface such as intensity, color and texture. The data points are stored in a database and retrieved for processing to generate digital elevation models, canopy models, building models, and contours.
In the database, the data points can be stored in traditional database tables with the help of a spatial index. However, it is time consuming to load large volumes of data into a table with a spatial index because the overhead of maintaining the index is high.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
In the drawings:
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
General Overview
Techniques described herein provide for the efficient loading and storage of point data without having to create indexes for the point data. Loading data without creating indexes can be very fast because point data only needs to be entered into a table row; no additional processing is needed because no index needs to be updated. In addition, the point data is stored so that efficient access and query optimizations are realized in other ways that do not require indexing.
Specifically, the point data is stored by a DBMS in an enhanced storage device in either uncompressed form, or in compressed hybrid-columnar format, enabling efficient querying of the point data. The enhanced storage device is also capable of processing “smart scan requests” issued by a database server of a DBMS. A smart scan request specifies requested data blocks, the operation to be performed on the requested data blocks, and one or more filtering criteria. In response, the enhanced storage device retrieves the data blocks from persistent storage, filters rows in the data blocks, and returns data blocks containing only rows that satisfy the filtering criteria. Details of “smart scan requests” are found in U.S. application Ser. No. 13/840,811, titled “Method To Accelerate Queries Using Dynamically Generated Alternate Data Formats In Flash Cache”, and filed on Mar. 15, 2013.
This smart scan capability enables a DBMS to off load portions of the work of computing a query to an enhanced storage device. Spatial queries are written into database statements that contain predicate clauses that can be computed efficiently through smart scan requests.
In addition, storing point data in hybrid-columnar format leverages the power and efficiency of storing and scanning columns stored in column-major format. In hybrid-columnar format, the rows are stored in data blocks and pivoted into column-major format. Hybrid-columnar format enables an enhanced storage device to perform fast columnar scanning of columns. Storing columns in column-major format also enhances compressibility and columnar scanning. Compressed columns can also be evaluated more quickly without decompressing column values in the compressed columns. Approaches for scanning compressed column-major data in hybrid-columnar format can be found in U.S. application Ser. No. 13/631,434, titled “Predicate Result Cache”, and filed on Sep. 28, 2012.
Data Storage
Data relating to each point collected over the targeted surface is stored in a row of a table stored in hybrid-columnar format in enhanced storage device. Each column in the table is an attribute of the data point (x and y).
Preferably, none of columns of point_tab 102 should be indexed. Thus, when data is loaded or otherwise inserted as rows into point_tab 102, the loading is relatively much quicker because no indexes are updated during the loading. In another embodiment, indexes on any of columns of point_table 102 are created asynchronously or synchronously with the loading the data. The data can be loaded programmatically, or by building a SQL*Loader control file and populated in direct path mode.
The table resides in an enhanced storage device, either uncompressed, or in hybrid-columnar format. As such, within each data block data of data blocks in which rows for the table are stored, if compressed, one or more of columns of point_tab 102 is stored in column-major format in the data block and/or are compressed in the data block.
Flow Chart of Overall Operation
In step 206, a set of “BETWEEN” clauses is generated based on the extents of the interior tiles. In step 208, a set of corresponding “BETWEEN” clauses is generated based on the extents of the boundary tiles. A DBMS offloads computation of such BETWEEN clauses by issuance of smart scan requests to an enhanced storage device.
In step 210, the SQL statement for the interior tiles is generated and when executed in a subsequent step (216), returns all data points from the table that are within the interior tiles. In step 212, a SQL statement is generated for the boundary tiles by calling a function, PointInPolygon, whose first parameter is the cursor created in step 212 and second parameter is the query window. The PointInPolygon function further evaluates every point returned by the cursor and only returns points that fall inside the perimeter of the query window. In step 216, the SQL statement for the interior tiles and the SQL statement for the boundary tiles are executed and their results are combined together. In step 218, the combined result is then returned to answer the query.
The Query Window
The Interior And Boundary Tiles
The Extents
The “BETWEEN” Clauses
Table 1 depicts the “BETWEEN” clauses generated for the extents of the tiles in
The “BETWEEN” clauses for the boundary tiles depicted in
The SQL Statement For The Interior Tiles
The SQL statement generated in step 210 of
The statement includes a SELECT clause with * to return all of the columns of the rows meeting the WHERE condition in the table specified in the FROM clause. The WHERE condition is an OR'ed list of BETWEEN clauses, where each clause specifies the extent of an interior tile of which there are four in
The SQL Statement For The Boundary Tiles
Table 6 depicts the SQL statement generated in step 214 for the boundary tiles of which there are 12 in
The SQL statement includes a table of the data points, a SELECT*clause, which selects all of the columns returned from the PointInPolygon function whose first argument is a CURSOR that selects all of the points that interact with boundary tiles and whose second argument is the query window. The PointInPolygon function further evaluates each point returned by the CURSOR and returns only those points that fall inside the perimeter of the query window.
One or more smart scan requests may be issued to an enhanced storage device for each of the BETWEEN clauses. Each such smart scan request specifies one or more filtering criteria based on the predicate conditions specified for column X (i.e. “a.x”) and column Y (i.e. “y.x”) in the respective BETWEEN clause. If the point data is compressed in hybrid-columnar format, the enhanced storage device performs columnar scanning for rows that satisfy the one or more filtering criteria. No index is used to identify rows that satisfy the one or more filtering criteria.
The CURSOR
The CURSOR is passed in as the first parameter to the PointInPolygon function. The CURSOR represents a SQL statement that returns points from the point table that fall within boundary tiles using BETWEEN clauses built from the extents of the boundary tiles. Thus, the CURSOR returns rows of the table that are within the boundary tiles, but not necessarily within the query window. A subsequent test determines whether the rows returned by the CURSOR are within the query window.
PointInPolygon Function
The pointInPolygon function takes as one input the CURSOR, which generates a set of rows that fall within the boundary tiles. The function takes in a query window as its other input and determines whether the rows generated by the CURSOR fall within the query window perimeter. Thus, the PointInPolygon function returns the rows that fall within the boundary tile and within the query perimeter.
To perform the query window perimeter check, the pointInPolygon function builds an in-memory R-tree index for the query window and uses the in-memory R-tree to process all of the cursor-generated rows against the R-tree. For more details regarding determining whether a point falls within a polygon, refer to U.S. application Ser. No. 13/780,990, titled “Method For Query Processing Of Topological Relationships Among Complex Spatial Objects” and filed on Feb. 28, 2013.
If the CURSOR includes additional predicates added to the WHERE clauses, this further reduces the number of points processed by the PointInPolygon function and thus overall query time.
Execution and Joining of SQL Statements
Once generated, the SQL statement for the interior tiles and the SQL statement for the boundary tiles are coupled with a UNION ALL and executed, as depicted in step 218 of
Hardware Overview
According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
For example,
Computer system 600 also includes a main memory 606, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 602 for storing information and instructions to be executed by processor 604. Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604. Such instructions, when stored in non-transitory storage media accessible to processor 604, render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions.
Computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604. A storage device 610, such as a magnetic disk or optical disk, is provided and coupled to bus 602 for storing information and instructions.
Computer system 600 may be coupled via bus 602 to a display 612, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 614, including alphanumeric and other keys, is coupled to bus 602 for communicating information and command selections to processor 604. Another type of user input device is cursor control 616, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
Computer system 600 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606. Such instructions may be read into main memory 606 from another storage medium, such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 610. Volatile media includes dynamic memory, such as main memory 606. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 602. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 600 can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector can receive the data carried in the infrared signal and appropriate circuitry can place the data on bus 602. Bus 602 carries the data to main memory 606, from which processor 604 retrieves and executes the instructions. The instructions received by main memory 606 may optionally be stored on storage device 610 either before or after execution by processor 604.
Computer system 600 also includes a communication interface 618 coupled to bus 602. Communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to a local network 622. For example, communication interface 618 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 620 typically provides data communication through one or more networks to other data devices. For example, network link 620 may provide a connection through local network 622 to a host computer 624 or to data equipment operated by an Internet Service Provider (ISP) 626. ISP 626 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the “Internet” 628. Local network 622 and Internet 628 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 620 and through communication interface 618, which carry the digital data to and from computer system 600, are example forms of transmission media.
Computer system 600 can send messages and receive data, including program code, through the network(s), network link 620 and communication interface 618. In the Internet example, a server 630 might transmit a requested code for an application program through Internet 628, ISP 626, local network 622 and communication interface 618.
The received code may be executed by processor 604 as it is received, and/or stored in storage device 610, or other non-volatile storage for later execution.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.
Number | Name | Date | Kind |
---|---|---|---|
5265246 | Li | Nov 1993 | A |
5278783 | Edmondson | Jan 1994 | A |
5404510 | Smith et al. | Apr 1995 | A |
5574835 | Duluk et al. | Nov 1996 | A |
5794229 | French et al. | Aug 1998 | A |
6061763 | Rubin et al. | May 2000 | A |
6216125 | Johnson | Apr 2001 | B1 |
6334123 | Ross et al. | Dec 2001 | B1 |
6959300 | Caldwell et al. | Oct 2005 | B1 |
7031994 | Lao et al. | Apr 2006 | B2 |
7219108 | Kothuri et al. | May 2007 | B2 |
7272589 | Guay | Sep 2007 | B1 |
7496589 | Jain et al. | Feb 2009 | B1 |
7552130 | Cook et al. | Jun 2009 | B2 |
7558290 | Nucci et al. | Jul 2009 | B1 |
7580927 | Abugov | Aug 2009 | B1 |
7720878 | Caldwell et al. | May 2010 | B2 |
8392382 | Marwah et al. | Mar 2013 | B2 |
8832142 | Marwah et al. | Sep 2014 | B2 |
20020075714 | Pereira et al. | Jun 2002 | A1 |
20030212694 | Potapov et al. | Nov 2003 | A1 |
20040078461 | Bendich et al. | Apr 2004 | A1 |
20050192941 | Biedenstein | Sep 2005 | A1 |
20050203932 | Kothuri et al. | Sep 2005 | A1 |
20050210054 | Harris | Sep 2005 | A1 |
20050222978 | Dory et al. | Oct 2005 | A1 |
20060074941 | Flynn | Apr 2006 | A1 |
20060074977 | Kothuri | Apr 2006 | A1 |
20060190425 | Chang | Aug 2006 | A1 |
20070109155 | Fallon | May 2007 | A1 |
20070112736 | Okamoto et al. | May 2007 | A1 |
20070143248 | Uppala | Jun 2007 | A1 |
20080050025 | Bashyam et al. | Feb 2008 | A1 |
20080071818 | Apanowicz et al. | Mar 2008 | A1 |
20080162424 | Adler et al. | Jul 2008 | A1 |
20080162523 | Kraus et al. | Jul 2008 | A1 |
20080281784 | Zane et al. | Nov 2008 | A1 |
20080294676 | Faerber et al. | Nov 2008 | A1 |
20080294863 | Faerber et al. | Nov 2008 | A1 |
20090019029 | Tommaney | Jan 2009 | A1 |
20090094010 | Kothuri et al. | Apr 2009 | A1 |
20090248648 | Johnson et al. | Oct 2009 | A1 |
20100030796 | Netz et al. | Feb 2010 | A1 |
20100278446 | Ganesh et al. | Nov 2010 | A1 |
20100281004 | Kapoor et al. | Nov 2010 | A1 |
20100281079 | Marwah et al. | Nov 2010 | A1 |
20110016157 | Bear et al. | Jan 2011 | A1 |
20110029569 | Ganesh et al. | Feb 2011 | A1 |
20110137940 | Gradin et al. | Jun 2011 | A1 |
20110219020 | Oks et al. | Sep 2011 | A1 |
20120054195 | Hu et al. | Mar 2012 | A1 |
20120054225 | Marwah et al. | Mar 2012 | A1 |
20120143833 | Ganesh et al. | Jun 2012 | A1 |
20120166446 | Bowman et al. | Jun 2012 | A1 |
20120173515 | Jeong et al. | Jul 2012 | A1 |
20130036101 | Marwah | Feb 2013 | A1 |
20140244635 | Hu et al. | Aug 2014 | A1 |
20140279959 | Marwah et al. | Sep 2014 | A1 |
20140281247 | Loaiza et al. | Sep 2014 | A1 |
20150032763 | Marwah et al. | Jan 2015 | A1 |
Number | Date | Country |
---|---|---|
WO 2012032184 | Mar 2012 | WO |
Entry |
---|
Dittrich et al., “Towards a One Size Fits All Database Architecture” Proceedings of the 5th Biennial Conference on Innovative Data Systems Research, dated Jan. 6, 2011, 4 pages. |
Rabb, David, “How to Judge a Columnar Database”, Information Management, website http://license.icopyright.net/user/viewFreeUse.act?fuid=MTMxMDAzMjU%3D, dated Dec. 14, 2007, 2 pages. |
Macnicol Roger et al, “Sybase IQ Multiplex-Designed for Analytics”, Proceedings of the 30th VLDB Conference, dated 2004, 4 pages. |
Sybase IQ, “Administration Guide”, Document ID: 35375-01-1121-02, dated Sep. 19, 1997, 426 pages. |
Sybase IQ, “Gaining the Performance Edge Using a Column-Oriented Database Management System”, 12 pages, dated Mar. 2009. |
Sybase IQ, “An Advanced Columnar Data Warehouse Architecture”, Winter Corporation, 17 pages, Dated Jun. 2010. |
Winter Corporation, “Demonstrating Efficiency in Large-Scale Data Warehousing”, A review of new TPC-H results for the Sun-Sybase IQ Platform, 20 pages, dated in 2003. |
Macnicol, Roger et. al., “Sybase IQ Multiplex—Designed for Analytics”, Proceeding VLDB 2004 Proceedings of the Thirteenth international conference on Very large data bases—vol. 30, 2004, VLDB Endowment, ISBN:0-12-088469-0, 4 pages. |
Abadi, Daniel J. et. al., “Column-Stores vs. Row-Stores: How Different Are They Really?” ACM 2008, SIGMOD, Jun. 9-Jun. 12, 2008, 14 pages. |
Ailamaki, Anastassia, et al, “Weaving Relations for Cache Performance,” Proceedings of the 27th International Conference on Very Large Data Bases, Rome, Italy, Sep. 11-14, 2001, 14 pages. |
Elmasri, et al., “Fundatmentals of Database Systems,” Third Edition, Addison-Wesley Longman, Inc., Copyright © 2000, ISBN-0-8053-1755-4, pp. 32, 70, 118, 131-132, 134, 155-159, 170, 252-254, 558, 569-573, 591-592, and 789-790 (26 pgs). |
W3Schools “SQL Update Statement”, Wayback Machine darted Apr. 19, 2012, on the internet www.w3schools.com/sql/sql—update.asp>, 1 page. |
Number | Date | Country | |
---|---|---|---|
20150149479 A1 | May 2015 | US |