Claims
- 1. A method of improving performance of database query language statements, the method comprising:
identifying one or more duplicated data values in the data block; calculating number of occurrences of each of the one or more duplicated data values; storing each of the one or more duplicated data values and the corresponding number of occurrences of the duplicated data value in an entry of a symbol structure ; receiving a database query language statement against data in the data block, the database query language statement comprising at least one predicate; reducing number of predicate evaluations on data in the data block and amount of data in the data block accessed by the database query language statement using the data duplication information in the symbol structure.
- 2. The method of claim 1 further comprising:
eliminating one or more occurrences of at least one of the one or more duplicated data values from the data block; and configuring portions of the data block corresponding to the eliminated occurrences to reference the appropriate symbol structure entry.
- 3. The method of claim 2 wherein a link or a pointer is used to reference the appropriate symbol structure entry.
- 4. The method of claim 1 further comprising:
updating the number of occurrences of one of the one or more duplicated data values when an occurrence of the duplicated data value is added or deleted; and removing a symbol structure entry when the number of occurrences of the duplicated data value in the symbol structure entry is zero.
- 5. The method of claim 1 wherein reducing number of predicate evaluations on data in the data block and amount of data in the data block accessed by the database query language statement comprises evaluating the at least one predicate on a duplicated data value in the data block at most once.
- 6. The method of claim 1 wherein reducing number of predicate evaluations on data in the data block and amount of data in the data block accessed by the database query language statement comprises:
determining whether the at least one predicate has previously been evaluated on a duplicate of a data value in the data block before evaluating the at least one predicate on the data value; utilizing a result of a previous evaluation of the at least one predicate on a duplicate of the data value when the at least one predicate has previously been evaluated on the duplicate of the data value; and evaluating the at least one predicate on the data value when the at least one predicate has not previously been evaluated on a duplicate of the data value.
- 7. The method of claim 6 wherein determining whether the at least one predicate has previously been evaluated on a duplicate of a data value comprises verifying whether the data value appears in a context, the context storing each data value in the data block that the at least one predicate has previously been evaluated on and a corresponding result of the previous predicate evaluation.
- 8. The method of claim 1 wherein reducing number of predicate evaluations on data in the data block and amount of data in the data block accessed by the database query language statement comprises:
calculating total number of data values in the data block on which the at least one predicate will evaluate to true; setting a counter equal to the total; reducing the counter when a data value in the data block on which the at least one predicate evaluated to true is accessed; and accessing data values in the data block until the counter is equal to zero.
- 9. The method of claim 1 wherein reducing number of predicate evaluations on data in the data block and amount of data in the data block accessed by the database query language statement comprises updating the data duplication information instead of accessing data values in the data block.
- 10. The method of claim 1 wherein the symbol structure is stored in the data block.
- 11. A method of improving performance of database query language statements, the method comprising:
maintaining information describing data duplication within a data block; receiving a database query language statement against data in the data block, the database query language statement comprising at least one predicate; and reducing number of predicate evaluations on data in the data block using the data duplication information.
- 12. The method of claim 11 wherein maintaining information describing data duplication within a data block comprises:
identifying one or more duplicated data values in the data block; calculating number of occurrences of each of the one or more duplicated data values; and storing each of the one or more duplicated data values and the corresponding number of occurrences of the duplicated data value in an entry of a symbol structure.
- 13. The method of claim 12 wherein maintaining information describing data duplication within a data block further comprises:
eliminating one or more occurrences of at least one of the one or more duplicated data values from the data block; and configuring portions of the data block corresponding to the eliminated occurrences to reference the appropriate symbol structure entry.
- 14. The method of claim 13 wherein a link or a pointer is used to reference the appropriate symbol structure entry.
- 15. The method of claim 12 wherein maintaining information describing data duplication within a data block further comprises updating the number of occurrences of one of the one or more duplicated data values when an occurrence of the duplicated data value is added or deleted.
- 16. The method of claim 15 wherein maintaining information describing data duplication within a data block further comprises removing a symbol structure entry when the number of occurrences of the duplicated data value in the symbol structure entry is zero.
- 17. The method of claim 11 wherein reducing number of predicate evaluations on data in the data block comprises evaluating the at least one predicate on a duplicated data value in the data block at most once.
- 18. The method of claim 11 wherein reducing number of predicate evaluations on data in the data block comprises:
determining whether the at least one predicate has previously been evaluated on a duplicate of a data value in the data block before evaluating the at least one predicate on the data value; utilizing a result of a previous evaluation of the at least one predicate on a duplicate of the data value when the at least one predicate has previously been evaluated on the duplicate of the data value; and evaluating the at least one predicate on the data value when the at least one predicate has not previously been evaluated on a duplicate of the data value.
- 19. The method of claim 18 wherein determining whether the at least one predicate has previously been evaluated on a duplicate of a data value comprises verifying whether the data value appears in a context, the context storing each data value in the data block that the at least one predicate has previously been evaluated on and a corresponding result of the previous predicate evaluation.
- 20. The method of claim 11 further comprising reducing amount of data accessed by the database query language statement using the data duplication information.
- 21. The method of claim 20 wherein reducing amount of data accessed by the database query language statement comprises:
calculating total number of data values in the data block on which the at least one predicate will evaluate to true; setting a counter equal to the total; reducing the counter when a data value in the data block on which the at least one predicate evaluated to true is accessed; and accessing data values in the data block until the counter is equal to zero.
- 22. The method of claim 20 wherein reducing amount of data accessed by the query comprises updating the data duplication information instead of accessing data values in the data block.
- 23. The method of claim 11 wherein the data duplication information is maintained in the data block.
- 24. A computer program product that includes a computer readable medium, the computer readable medium having stored thereon a sequence of instructions which, when executed by a processor, causes the processor to execute a process for improving performance of database query language statements, the process comprising:
maintaining information describing data duplication within a data block; receiving a database query language statement against data in the data block, the database query language statement comprising at least one predicate; and reducing number of predicate evaluations on data in the data block using the data duplication information.
- 25. The computer program product of claim 24 wherein maintaining information describing data duplication within a data block comprises:
identifying one or more duplicated data values in the data block; calculating number of occurrences of each of the one or more duplicated data values; and storing each of the one or more duplicated data values and the corresponding number of occurrences of the duplicated data value in an entry of a symbol structure.
- 26. The computer program product of claim 25 wherein maintaining information describing data duplication within a data block further comprises:
eliminating one or more occurrences of at least one of the one or more duplicated data values from the data block; and configuring portions of the data block corresponding to the eliminated occurrences to reference the appropriate symbol structure entry.
- 27. The computer program product of claim 25 wherein maintaining information describing data duplication within a data block further comprises:
updating the number of occurrences of one of the one or more duplicated data values when an occurrence of the duplicated data value is added or deleted; and removing a symbol structure entry when the number of occurrences of the duplicated data value in the symbol structure entry is zero.
- 28. The computer program product of claim 24 wherein reducing number of predicate evaluations on data in the data block comprises evaluating the at least one predicate on a duplicated data value in the data block at most once.
- 29. The computer program product of claim 24 wherein reducing number of predicate evaluations on data in the data block comprises:
determining whether the at least one predicate has previously been evaluated on a duplicate of a data value in the data block before evaluating the at least one predicate on the data value; utilizing a result of a previous evaluation of the at least one predicate on a duplicate of the data value when the at least one predicate has previously been evaluated on the duplicate of the data value; and evaluating the at least one predicate on the data value when the at least one predicate has not previously been evaluated on a duplicate of the data value.
- 30. The computer program product of claim 24 further comprising reducing amount of data accessed by the database query language statement using the data duplication information.
- 31. The computer program product of claim 30 wherein reducing amount of data accessed by the database query language statement comprises:
calculating total number of data values in the data block on which the at least one predicate will evaluate to true; setting a counter equal to the total; reducing the counter when a data value in the data block on which the at least one predicate evaluated to true is accessed; and accessing data values in the data block until the counter is equal to zero.
- 32. The computer program product of claim 30 wherein reducing amount of data accessed by the database query language statement comprises updating the data duplication information instead of accessing data values in the data block.
- 33. The computer program product of claim 24 wherein the data duplication information is maintained in the data block.
- 34. A system for improving performance of database query language statements, the system comprising:
means for maintaining information describing data duplication within a data block; means for receiving a database query language statement against data in the data block, the database query language statement comprising at least one predicate; and means for reducing number of predicate evaluations on data in the data block using the data duplication information.
- 35. The system of claim 34 wherein means for maintaining information describing data duplication within a data block comprises:
means for identifying one or more duplicated data values in the data block; means for calculating number of occurrences of each of the one or more duplicated data values; and means for storing each of the one or more duplicated data values and the corresponding number of occurrences of the duplicated data value in an entry of a symbol structure.
- 36. The system of claim 35 wherein means for maintaining information describing data duplication within a data block further comprises:
means for eliminating one or more occurrences of at least one of the one or more duplicated data values from the data block; and means for configuring portions of the data block corresponding to the eliminated occurrences to reference the appropriate symbol structure entry.
- 37. The system of claim 35 wherein means for maintaining information describing data duplication within a data block further comprises:
means for updating the number of occurrences of one of the one or more duplicated data values when an occurrence of the duplicated data value is added or deleted; and means for removing a symbol structure entry when the number of occurrences of the duplicated data value in the symbol structure entry is zero.
- 38. The system of claim 34 wherein means for reducing number of predicate evaluations on data in the data block comprises means for evaluating the at least one predicate on a duplicated data value in the data block at most once.
- 39. The system of claim 34 wherein means for reducing number of predicate evaluations on data in the data block comprises:
means for determining whether the at least one predicate has previously been evaluated on a duplicate of a data value in the data block before evaluating the at least one predicate on the data value; means for utilizing a result of a previous evaluation of the at least one predicate on a duplicate of the data value when the at least one predicate has previously been evaluated on the duplicate of the data value; and means for evaluating the at least one predicate on the data value when the at least one predicate has not previously been evaluated on a duplicate of the data value.
- 40. The system of claim 34 further comprising means for reducing amount of data accessed by the database query language statement using the data duplication information.
- 41. The system of claim 40 wherein means for reducing amount of data accessed by the database query language statement comprises:
means for calculating total number of data values in the data block on which the at least one predicate will evaluate to true; means for setting a counter equal to the total; means for reducing the counter when a data value in the data block on which the at least one predicate evaluated to true is accessed; and means for accessing data values in the data block until the counter is equal to zero.
- 42. The system of claim 40 wherein means for reducing amount of data accessed by the database query language statement comprises means for updating the data duplication information instead of accessing data values in the data block.
- 43. The system of claim 34 wherein the data duplication information is maintained in the data block.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of co-pending U.S. patent application Ser. No. 10/144,689, filed on May 10, 2002, entitled “Method and Mechanism for Storing and Accessing Data,” which is incorporated herein by reference in its entirety.
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
10144689 |
May 2002 |
US |
Child |
10426452 |
Apr 2003 |
US |