The invention relates generally to computer systems, and more particularly to an improved system and method of data partitioning for parallel processing of dynamically generated application data.
A major problem faced by an online advertising publisher is to process dynamically generated financial data for sale of advertisement impressions to online advertisers. Online advertisers may visit a website of an online advertising publisher to place orders for displaying advertisements on display advertisement properties which represent a collection of related web pages that have advertising space allocated for displaying advertisements. A typical order may request to display advertisements on display properties for 10 million times over a period of six months. There may be several running orders for any given advertiser at a time. In order for an online application to place an order for an advertiser, the application may check the account receivable balance and credit limit of an advertiser at the time an order is being place to verify that there is a sufficient credit limit available to place the order. For instance, the account receivable balance and any amount for running orders may be subtracted from the credit limit. To do so, an online application needs to obtain the current financial information to process the order. Such financial data may be dynamically generated as orders are placed by advertisers.
Current financial database systems may store such financial data in data tables and may keep such financial data like account receivable information and credit limits in a proprietary database table format. An online application may receive a data table with financial information for online advertisers that may be as large as a few million rows and processing each row in serial fashion by reading financial data one row at a time is inefficient for a high volume data processing system. Although functional, sequential processing of data from data tables presents a bottleneck for online applications processing orders such as online advertising orders. Furthermore, there may be multiple data types within a large data table of dynamically generated data.
What is needed is a way for an online application to efficiently process a high volume of dynamically generated data. Such a system and method should be able to process multiple data types within the dynamically generated data.
The present invention provides a system and method of data partitioning for parallel processing of dynamically generated application data. In a data partitioning framework for parallel processing of dynamically generated application data, a data partitioning engine that partitions application data according to a data partitioning policy may be operably coupled to one or more data partition processors that may each process different partitions of the data according to processing instructions for the application data. In an implementation, an application may send a request to the data partitioning engine to partition the application data specified by a data partitioning policy and to process each of the data partitions according to processing instructions. Asynchronous data partition processors may be instantiated to perform parallel processing of the partitioned data. The data may be partitioned according to the data partitioning policy and processed according to the processing instructions. And the results may be returned to the application.
In an embodiment of a data partitioning framework for parallel processing of dynamically generated application data, a request may be received to perform parallel processing of dynamically generated data. The generated data may be partitioned according to a data partitioning policy. The data partitioning policy may be flexibly defined by an application for partitioning data any number of ways, including balancing the data volume across each of the partitions or partitioning the data by data type. Then the partitioned data may be processed according to processing instructions provided by an application. In an embodiment, the data partitions may represent different data types that may be processed in parallel by data partition processors for each data type. The processing status of the data partition may be updated after processing is finished. And the results of processing the data partitions may be returned to the application.
The present invention may be used by many applications to partition and process dynamically generated data. For instance, the present invention may be used by an online application of an advertising publisher for parallel processing of advertiser's financial information needed to complete advertisers' orders being placed for display advertising. Or the present invention may generally be used by an online application for batch processing of data. For any of these applications, the present invention may partition data for an application according to a data partitioning policy and perform parallel processing of the data partitions according to processing instructions that may be provided by an application.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer system 100 may include a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer system 100 and includes both volatile and nonvolatile media. For example, computer-readable media may include volatile and nonvolatile computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer system 100. Communication media may include computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For instance, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The system memory 104 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 106 and random access memory (RAM) 110. A basic input/output system 108 (BIOS), containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, is typically stored in ROM 106. Additionally, RAM 110 may contain operating system 112, application programs 114, other executable code 116 and program data 118. RAM 110 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by CPU 102.
The computer system 100 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
The computer system 100 may operate in a networked environment using a network 136 to one or more remote computers, such as a remote computer 146. The remote computer 146 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100. The network 136 depicted in
The present invention is generally directed towards a system and method of data partitioning for parallel processing of dynamically generated application data. A data partitioning framework may be provided for parallel processing of data partitions of dynamically generated data for an application. An application may send a request to partition the application data specified by a data partitioning policy and to process each of the data partitions according to processing instructions. The data partitioning policy may be flexibly defined by an application for partitioning data any number of ways, including balancing the data volume across each of the partitions or partitioning the data by data type. Asynchronous data partition processors may be instantiated to perform parallel processing of the partitioned data. The data may be partitioned according to the data partitioning policy and processed according to the processing instructions. And the results may be returned to the application.
As will be seen, by providing a data partitioning framework for parallel processing of dynamically generated application data, the data partitions may be defined dynamically by a data partitioning policy to accommodate a high volume of dynamically generated data. The framework may be used to process any type of data in parallel, including processing multiple data types at a time. As will be understood, the various block diagrams, flow charts and scenarios described herein are only examples, and there are many other scenarios to which the present invention will apply.
Turning to
In various embodiments, a client computer 202 may be operably coupled to a server 214 by a network 212. The client computer 202 may be a computer such as computer system 100 of
The server 214 may be any type of computer system or computing device such as computer system 100 of
In particular, the server 214 may include a data partitioning engine 216 for partitioning data according to instructions of a data partitioning policy that may be provided by an application, one or more data partition processors 220 for processing data of a data partition according to processing instructions that may be provided by an application, and one or more data partition status monitors 222 for monitoring and updating the processing status of data partitions. The data partitioning engine 216 may include a request handler 218 for receiving a request to partition and process data and may include services for returning the results of partitioning and processing the data. Each of these components may be any type of executable software code that may execute on a computer such as computer system 100 of
There are many applications that may use the data partitioning framework of the present invention to partition and process dynamically generated data. For instance, the present invention may be used by an online application of an advertising publisher for parallel processing of advertiser's financial information needed to complete advertisers' orders being placed for display advertising. Or the present invention may be generally used by an online application for batch processing of data. For any of these applications, the present invention may partition data for an application according to a data partitioning policy and perform parallel processing of the data partitions according to processing instructions that may be provided by an application.
And parallel processing of the data may be performed at step 308. In an embodiment, only one instance of a data partition processor may process a data partition. In an embodiment, the partitioned data may be processed according to processing instructions provided by an application. At step 310, the results of processing the data may be returned, for instance, to an application. And at step 312, the processing status of the dynamically generated data may be updated. In an embodiment, the processing status for a partition may be updated when other partitions of the same specific type are processed completely.
At step 406, the number of partitions may be obtained and at step 408, the data table may be partitioned into the number of data partitions by applying a partitioning technique specified by the data partitioning policy. In an embodiment, the data partitions may represent different data types that may be processed in parallel by data partition processors for each data type. At step 410, the processing status of each partition may be initialized. In an embodiment, the processing status for a data partition may be stored in a data partitioning process table and set to indicate that the data partition is being processed. And the data partitions may be output at step 412. For instance, the number of data partitions and the location of each data partition in the data table may be stored in a data partitioning process table.
Once a data partition may be obtained, a data partitioning processor may obtain processing instructions at step 504 for processing the data partition. In an embodiment, the processing instructions may be provided by an application. In other embodiments, the processing instructions may be stored for a particular data table and application, and a data partitioning processor may lookup the processing instructions for the particular data table and application. For instance, an application may store a lookup table for an account receivable data table, a number of applications that access this data, and processing instructions for processing the data.
At step 506, the data in the data partition may be processed by the data partitioning processor by applying the processing instructions. The processing instructions may be a script, one or more rules, or an object with methods. For instance, the processing instructions may be as simple as to replicate the data set to one or more business applications. At step 508, the processing status of the data partition may be updated after processing is finished. In an embodiment, the status of a data partition stored in a data partitioning process table may be updated. Once a data partition processor has completed processing of a data partition, the data partition processor may continue to process data partitions according to the data processing instructions until there are no remaining unprocessed data partitions.
Thus the present invention may provide a partitioning framework that may process a high volume of dynamically generated data in parallel subsets. Importantly, the data partitions may be defined dynamically by a data partitioning policy to accommodate a high volume of dynamically generated data. The framework may be used to process any type of data in parallel, including processing multiple data types at a time. Any number of data partition processors may be instantiated for processing each of the data partitions asynchronously. And a data partitioning policy may be flexibly defined by an application for partitioning data any number of ways, including balancing the data volume across each of the partitions or partitioning the data by data type.
As can be seen from the foregoing detailed description, the present invention provides an improved system and method of data partitioning for parallel processing of dynamically generated application data. A data partitioning framework may be implemented for an application to specify a data partition policy for partitioning a data source and processing instructions for processing the data partitions. The application may send a request to perform parallel processing of dynamically generated data. Asynchronous data partition processors may be instantiated to perform parallel processing of the partitioned data. The data may be partitioned according to the data partitioning policy and processed according to the processing instructions. And the results may be returned to the application. As a result, the system and method provide significant advantages and benefits needed in contemporary computing, and more particularly in online applications.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.