Data Statement Chunking

Information

  • Patent Application
  • 20230297570
  • Publication Number
    20230297570
  • Date Filed
    December 22, 2022
    2 years ago
  • Date Published
    September 21, 2023
    a year ago
  • CPC
    • G06F16/24537
    • G06F16/24532
    • G06F16/24545
    • G06F16/2455
  • International Classifications
    • G06F16/2453
    • G06F16/2455
Abstract
Techniques are presented for applying fine-grained client-specific rules to divide (e.g., chunk) data statements to achieve cost reduction and/or failure rate reduction associated with executing the data statements over a subject dataset. Data statements for the subject dataset are received from a client. Statement attributes derived from the data statements are processed with respect to fine-grained rules and/or other client-specific data to determine whether a data statement chunking scheme is to be applied to the data statements. If a data statement chunking scheme is to be applied, further analysis is performed to select a data statement chunking scheme. A set of data operations are generated based at least in part on the selected data statement chunking scheme. The data operations are issued for execution over the subject dataset. The results from the data operations are consolidated in accordance with the selected data statement chunking scheme and returned to the client.
Description
Claims
  • 1. A method for chunking data statements based at least in part on a set of client-specific information in a client data statement processing layer, the method comprising: receiving one or more data statements issued by at least one client, the data statements issued by the client to operate over a subject dataset;applying at least a portion of a set of client-specific data to the data statements to determine at least one chunking scheme;accessing performance data to generate performance estimates for a set of candidate chunking schemes from the at least one chunking scheme;selecting a chunking scheme from the set of candidate chunking schemes based on the performance estimates;generating one or more data operations from the data statements, the data operations generated based at least in part on the chunking scheme and on the performance estimates; andexecuting the data operations over the subject dataset to generate a result set.
  • 2. The method of claim 1, wherein the client-specific data comprises at least one of, one or more statement chunking rules, a set of enhanced dataset metadata, or a set of performance data.
  • 3. The method of claim 1, wherein the data operations are executed at one or more query engines, and wherein the client-specific data is inaccessible by the query engines.
  • 4. The method of claim 1, further comprising: receiving a set of dataset metadata associated with the subject dataset;expanding the dataset metadata into a set of expanded dataset metadata; andconsulting the expanded dataset metadata to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 5. The method of claim 1, further comprising: analyzing the data statements to determine one or more statement attributes associated with the data statements; andapplying the portion of the client-specific data to at least one of the statement attributes to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 6. The method of claim 1, further comprising: generating one or more performance estimates associated with the data statements; andapplying the portion of the client-specific data to at least one of the performance estimates to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 7. The method of claim 6, wherein at least one of the performance estimates is based at least in part on a set of performance data.
  • 8. The method of claim 7, wherein the performance data comprises at least one of, a set of historical data operations performance statistics, or a set of historical data operations behavioral characteristics.
  • 9. The method of claim 1, further comprising merging two or more results from the data operations into the result set.
  • 10. The method of claim 1, wherein the data operations are executed in accordance with one or more execution directives, the execution directives indicating that one or more of the data operations be executed in parallel, in sequence, asynchronously, or synchronously.
  • 11. A computer readable medium, embodied in a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when stored in memory and executed by one or more processors causes the one or more processors to perform a set of acts for chunking data statements based at least in part on a set of client-specific information in a client data statement processing layer, the acts comprising: receiving one or more data statements issued by at least one client, the data statements issued by the client to operate over a subject dataset;applying at least a portion of a set of client-specific data to the data statements to determine at least one chunking scheme;accessing performance data to generate performance estimates for a set of candidate chunking schemes from the at least one chunking scheme;selecting a chunking scheme from the set of candidate chunking schemes based on the performance estimates;generating one or more data operations from the data statements, the data operations generated based at least in part on the chunking scheme and on the performance estimates; andexecuting the data operations over the subject dataset to generate a result set.
  • 12. The computer readable medium of claim 11, wherein the client-specific data comprises at least one of, one or more statement chunking rules, a set of enhanced dataset metadata, or a set of performance data.
  • 13. The computer readable medium of claim 11, wherein the data operations are executed at one or more query engines, and wherein the client-specific data is inaccessible by the query engines.
  • 14. The computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of: receiving a set of dataset metadata associated with the subject dataset;expanding the dataset metadata into a set of expanded dataset metadata; andconsulting the expanded dataset metadata to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 15. The computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of: analyzing the data statements to determine one or more statement attributes associated with the data statements; andapplying the portion of the client-specific data to at least one of the statement attributes to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 16. The computer readable medium of claim 11, further comprising instructions which, when stored in memory and executed by the one or more processors causes the one or more processors to perform acts of: generating one or more performance estimates associated with the data statements; andapplying the portion of the client-specific data to at least one of the performance estimates to perform at least one of, determining the at least one chunking scheme, or generating the one or more data operations.
  • 17. The computer readable medium of claim 16, wherein at least one of the performance estimates is based at least in part on a set of performance data.
  • 18. The computer readable medium of claim 17, wherein the performance data comprises at least one of, a set of historical data operations performance statistics, or a set of historical data operations behavioral characteristics.
  • 19. A system for chunking data statements based at least in part on a set of client-specific information in a client data statement processing layer, the system comprising: a storage medium having stored thereon a sequence of instructions; andone or more processors that execute the instructions to cause the one or more processors to perform a set of acts, the acts comprising: receiving one or more data statements issued by at least one client, the data statements issued by the client to operate over a subject dataset;applying at least a portion of a set of client-specific data to the data statements to determine at least one chunking scheme;accessing performance data to generate performance estimates for a set of candidate chunking schemes from the at least one chunking scheme;selecting a chunking scheme from the set of candidate chunking schemes based on the performance estimates;generating one or more data operations from the data statements, the data operations generated based at least in part on the chunking scheme and on the performance estimates; andexecuting the data operations over the subject dataset to generate a result set.
  • 20. The system of claim 19, wherein the client-specific data comprises at least one of, one or more statement chunking rules, a set of enhanced dataset metadata, or a set of performance data.
Continuations (1)
Number Date Country
Parent 15836836 Dec 2017 US
Child 18086907 US