Claims
- 1. A method of performing stratified sampling in a database system, comprising:
receiving a query containing a clause indicating stratified sampling of a source table is to be performed, the clause containing plural stratification conditions; and generating one or more commands to send to a processing module, the one or more commands containing instructions to evaluate the stratification conditions and to perform sampling of data from the source table.
- 2. The method of claim 1, wherein the plural stratification conditions correspond to plural strata, the method further comprising writing data from a row of the source table into one of plural files depending on which of the stratification conditions the row satisfies.
- 3. The method of claim 2, further comprising performing sampling of data in the plural files in response to the one or more commands.
- 4. The method of claim 3, wherein the database system has plural access modules across which each file is partitioned,
wherein performing the sampling comprises performing sampling in each of the plural access modules of data in a corresponding partition of the file.
- 5. The method of claim 4, further comprising determining a number of samples to request from each access module.
- 6. The method of claim 5, wherein determining the number of samples to request from each access module comprises calculating a number that is proportional to the number of rows in the corresponding partition.
- 7. The method of claim 1, wherein generating the one or more commands comprises generating a command to process the query and to perform stratification actions if a criterion is satisfied.
- 8. The method of claim 7, further comprising determining if the criterion is satisfied, wherein the criterion comprises the query being a simple query that does not specify a join, an aggregate, or an online analytical processing function.
- 9. The method of claim 7, further comprising generating a command to process the query and an extra command to perform the stratification actions if the criterion is not satisfied.
- 10. An article comprising at least one storage medium containing instructions that when executed cause a database system to:
generate one or more commands to perform stratified sampling; and send the one or more commands to plural access modules of the database system to cause each of the plural access modules to perform the stratified sampling in parallel.
- 11. The article of claim 10, wherein the instructions when executed cause the database system to receive a query containing a clause containing plural stratification conditions for the stratified sampling.
- 12. The article of claim 11, wherein the stratification conditions correspond to plural strata, wherein the instructions when executed cause the database system to generate plural spool files to receive rows for the plural strata.
- 13. The article of claim 12, wherein the instructions when executed cause the database system to perform random sampling of data in each spool file to obtain samples for a corresponding stratum.
- 14. The article of claim 13, wherein each spool file is partitioned across the plural access modules, wherein the instructions when executed cause the database system to perform the random sampling of each spool file by performing random sampling in each access module.
- 15. A database system comprising:
a storage to store a base table; and a controller adapted to receive a request containing plural stratification conditions to divide data in the base table into corresponding plural strata, the controller adapted to perform random sampling, in response to the request, of data in each stratum.
- 16. The database system of claim 15, further comprising plural storage modules, wherein the controller comprises plural access modules to manage data access in the corresponding plural storage modules.
- 17. The database system of claim 16, wherein the base table is partitioned across the plural access modules.
- 18. The database system of claim 17, wherein the controller is adapted to generate plural spool files to store data in the plural strata; and
wherein the controller is adapted to perform random sampling of data in each spool file.
- 19. The database system of claim 18, wherein the controller is adapted to determine a number of samples to request from each access module.
- 20. The database system of claim 19, wherein the number of samples to request from one access module is different from the number of samples to request from another access module.
- 21. A database system comprising:
a plurality of storage modules; a plurality of access modules to manage respective storage modules; and a parsing engine to receive a stratified sampling query specifying plural stratification conditions, the parsing engine to generate one or more commands to indicate performance of the stratified sampling, the parsing engine to send the one or more commands to the access modules, in response to the one or more commands, each access module to generate plural input spool files corresponding to plural strata, the input spool files to store qualifying rows from a source table, the access module to selectively write a given row into one of the input spool files based on which stratification condition the given row satisfies, each access module to further perform random sampling of the rows in each input spool file.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This is a continuation-in-part of U.S. Ser. No. 09/457,274, filed Dec. 8, 1999.
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09457274 |
Dec 1999 |
US |
Child |
10113497 |
Apr 2002 |
US |