The present invention is directed towards management of on-line advertising contracts based on targeting.
The marketing of products and services online over the Internet through advertisements is big business. Advertising over the Internet seeks to reach individuals within a target set having very specific demographics (e.g. male, age 40-48, graduate of Stanford, living in California or New York, etc). This targeting of very specific demographics is in significant contrast to print and television advertisement that is generally capable only to reach an audience within some broad, general demographics (e.g. living in the vicinity of Los Angeles, or living in the vicinity of New York City, etc.). The single appearance of an advertisement on a webpage is known as an online advertisement impression. Each time a web page is requested by a user via the Internet, represents an impression opportunity to display an advertisement in some portion of the web page to the individual Internet user. Often, there may be significant competition among advertisers for a particular impression opportunity to be the one to provide that advertisement impression to the individual Internet user.
To participate in this competition, some advertisers enter into contracts with an ad serving company (or publisher) to receive impressions over a desired time period. An advertiser may further specify desired targeting criteria. For example, an advertiser and the ad serving company may agree to post 2,000,000 impressions over thirty days for US$15,000. Others merely enter into non-guaranteed contracts with the ad server company and only pay for those impressions actually made by the ad serving company on their behalf. Of course, in modern Internet advertising systems, the competition among advertisers is often resolved by an auction, and the winning bidder's advertisements are shown in the available spaces of the impression.
Indeed online advertising and marketing campaigns often rely at least partially on an auction process where any number of advertisers book contracts to submit and authorize highest bids corresponding to the contract characteristics (e.g. keywords, or bid phrases or various demographics). The advertisements corresponding to the winning contracts are used for presenting the impression.
Considering that (1) the actual existence of a web page impression opportunity suited for displaying an advertisement is not known until the user clicks on a link pointing to the subject web page, and (2) that the bidding process for selecting advertisements must complete before the web page is actually displayed, it then becomes clear that the process of assembling competing contracts, completing the bidding, and compositing the web page with the winner's ads must start and complete within a matter of fractions of a second. Thus, a system that rapidly matches contracts to opportunities for the purpose of optimizing the allocation of online advertising is needed.
Other automated features and advantages of the present invention will be apparent from the accompanying drawings, and from the detailed description that follows below.
A method for indexing online advertising contracts for rapid retrieval and matching in order to match satisfying online advertising contracts to online advertising slots. The descriptions of the advertising contracts include logical predicates indicating applicability to a particular demographic or targeted web page viewer as defined by the advertiser. Also, the descriptions of advertising slots contain logical predicates indicating demographics or targets of a particular web page and/or web page viewer, thus matches can be performed using at least matches on the basis of intersecting demographics or other sets of target descriptors. Included are structure and techniques for receiving a set of contracts with predicates, preparing a data structure index of the set of contracts, receiving an advertising slot with predicates, and further includes structure and techniques for retrieving from the data structure a set of contracts that satisfy one or more match criteria to match the advertising slot predicates. Embodiments include cases were the predicates are presented in conjoint forms and in disjoint forms, and techniques are provided to consider indexing and matching in cases of IN predicates and well as NOT-IN predicates.
The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.
In the following description, numerous details are set forth for purpose of explanation. However, one of ordinary skill in the art will realize that the invention may be practiced without the use of these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to not to obscure the description of the invention with unnecessary detail.
In the context of Internet advertising, bidding for placement of advertisements within an Internet environment (e.g. system 100 of
In the slightly more sophisticated model of
Given any of such representations of a point in N-dimensional space, any degree of N can be captured over time, and such a capture (e.g. a history) might be used in predicting future events. A finer degree of specificity is useful in targeted advertising. For example, an advertiser for a hotel in mid-town New York City might want to place advertisements only on the empirestate.com/hotels web page as shown to an Internet user, and then only if the Internet user is from California, and then only if the Internet user is male, and so on. Such an advertiser might be willing to pay a premium for a spot that is most prominently located on the web page. In fact, such an advertiser might be joined by other hoteliers who also want their advertisements to be displayed in the most prominently located spot on the web page. However, the inventory for that one web page impression being displayed to that particular user at that point in time is of course limited to just that one impression. Thus, multiple competing advertisers might elect to bid in a market (e.g. an exchange) via an exchange server or auction engine 107 in order to win the most prominent spot, or an advertiser might enter into a contract (e.g. with the Internet property or with an advertising agency, or with an advertising network, etc) to purchase in advance all of the desired spots for some time duration (e.g. all top spots in all impressions of the web page empirestate.com/hotels for all of 2008). Such an arrangement and variants as used here is termed a contract. A contract might be as simple as the one in the previous example, or a contract might be more complex, possibly involving many attribute, value pairs to describe a target. Alternatively, the advertiser might not enter into such a pre-arranged placement contract (also known as guaranteed delivery), and instead might decide to allow impressions to be made over time, on the fly, when the advertiser's bid is the winning bid (also known as non-guaranteed delivery). In some embodiments, the system 150 might host a variety of modules to serve management and control operations (e.g. forecasting 111, admission control 115, automated bidding management 114, objective optimization 110, etc) and storage functions (e.g. storage of advertisements 113, storage of statistics 112, etc) pertinent to both guaranteed delivery as well as non-guaranteed delivery methods. Of course there are many differences and many implications in the set-up and operation of guaranteed delivery versus non-guaranteed delivery, some of which are described below.
In most cases, the set-up and operational differences between guaranteed delivery model versus non-guaranteed delivery model creates artificial distinctions between these two models. In particular, pricing of display inventory that is priced at fixed contract prices (e.g. guaranteed delivery contracts), and pricing of inventory that is priced in a real-time auction in a spot market or through other means (non-guaranteed delivery) may differ significantly. In some cases the fixed contract price of an impression is lower than the true market value of the impression (e.g. if the fixed price contract covered some exceptionally high traffic period). In some cases, the reverse is true. Additional artificial distinctions between these two models cause difficult-to-price differences, for instance, some ad network systems always serve guaranteed contracts their quota before serving non-guaranteed contracts. This mode can result in the phenomenon of high-quality impressions to be mostly served to guaranteed contracts.
In some markets, however, advertisers demand a mix of guaranteed and non-guaranteed contracts. This creates a need for a unified marketplace whereby an impression opportunity can be allocated to a guaranteed or non-guaranteed contract based on the value of the impression opportunity to the different contracts. Such a unified marketplace enables a more equitable allocation of inventory, and also promotes increased competition between guaranteed and non-guaranteed contracts.
What is needed are techniques that enables guaranteed contracts to bid on the spot-market for each impression opportunity and thus compete directly with non-guaranteed contracts. The need is intensified the more that display advertising increases in refinement of the target. Indeed increased targeting allows advertisers to reach more relevant customers. For example, an advertiser selling family fitness aids might specify a target using broad targeting constraints such as “1 million Yahoo! users from 1 Aug. 2008-31 Aug. 2008”. In contrast, an advertiser selling fitness aids for surfers might specify a much more fine-grained constraint such as “10,000 Yahoo! users from 1 Aug. 2008-8 Aug. 2008 who are California males between the ages of 20-35 who are working in the healthcare industry and like surfing and autos”. Fine-grained targeting has implications to the aforementioned techniques. First, there is the need to forecast future inventory for fine-grained targeted combinations. Second, there is the need to manage contention in a high-dimensional targeting space. That is, given hundreds (or thousands, or more) distinct targeting attributes it is reasonable that different advertisers might specify different high-dimensioned targets, and further that multiple advertisers might specify overlapping targeting combinations. Thus there is a need to accurately forecast inventory of targeted impression opportunities such that the union of all guaranteed contracts do not substantially over subscribe the available impression opportunities. Resolving to a statistically reliable forecast of inventory (e.g. a plan) might be supported in part by historical statistics and heuristics.
Given such an environment the admission control portion of module 310 serves to generate quotes for guaranteed contracts and accept bookings of guaranteed contracts, the pricing portion of module 310 serves to price guaranteed contracts, the ad serving portion of module 320 selects guaranteed ads for an incoming opportunity, the bidding portion of module 320 submits bids for the selected guaranteed ads on an exchange 340 Additionally, an optimizer 390 might communicate with a plan distribution and statistics gathering module 350, and one or more forecasting modules 360, 370, 380 and return results that optimizes for an overall objective.
Given the system 300 of
In one embodiment, the operation of the entire system 300 is orchestrated by an optimization module 390. This optimization module 390 periodically takes in a forecast of supply (future impression opportunities), guaranteed demand (expected guaranteed contracts) and non-guaranteed demand (expected bids in the spot market) and matches supply to demand using an overall objective function. The optimization module then sends a plan of the optimization result to the admission control and pricing module 310. Of course, inasmuch as the plan is based on statistics relating to data gathered over time, the plan is updated every few hours based on new estimates for supply, new estimates demand, and new estimates for deliverable impressions.
In another scenario, and one that relates to techniques for finding all applicable contracts (i.e. guaranteed as well as non-guaranteed contracts), and bringing their respective bids to the unified marketplace might operate in a scenario described as follows: When a sales person issues a query (to the admission control and pricing module 310) for some contract (e.g. including a target specification and duration) for future delivery (i.e. guaranteed or non-guaranteed), the system 300 invokes the supply forecasting module 360 to identify how much inventory is available for that contract. Since targeting queries can be very fine-grained in a high-dimensional space, the supply forecasting module might employ a scalable multi-dimensional database indexing technique to capture and store the correlations between different targeting attributes. The scalable multi-dimensional database indexing technique might also serve to capture and retrieve correlations found among multiple contracts. For example, if there are two sales persons submitting contracts in contention (e.g. “Yahoo! finance users who are California males” and “Yahoo! users who are aged 20-35 and interested in sports”), some number of forecasted impression opportunities might match both contracts, but of course the inventory of matching impression opportunities should not be double-counted. In order to deal with contract contention for supply in a high-dimensional space, the supply forecasting system might produce impression samples (i.e. a selected subset of the total available inventory) as opposed to just available inventory counts. Thus, impression opportunity samples from available inventory might be used to determine how many contracts can be satisfied by each impression opportunity. Given the impression samples, the admission control module uses the plan to calculate the extent of contention between contracts in the high-dimensional space. Finally, the admission control and pricing module 310 might return allocated available inventory to each of the sales persons without any double-counting. In addition, the admission control module might calculate the price for each contract and return pricing along with the quantity of allocated impression opportunities.
Now, stating the problem to be solved more formally, given an advertising opportunity (e.g. an impression opportunity), specified as a vector (e.g. list) of (feature, value) pairs, find all of the contracts that could bid on this opportunity. For example, given the conjunctive impression opportunity profile vector {(state=CA) AND (gender=male) AND (age=50)}, some possibly matching contracts would include those asking for {(gender=male) AND (state=CA)}, and would include those asking for {(gender=male) AND {(age=50)} because each clause of each of those contracts are satisfied against the example impression opportunity vector. The embodiments of the invention herein permits both disjunctive as well as conjunctive types of contracts and even contracts including more complex predicates to be handled efficiently. As regards contracts including complex predicates, embodiments of the invention disclosed herein support both “IN” (e.g. state IN (NY, CA, MA)) and “NOT-IN” predicates (e.g. state NOT-IN (NY, CA, MA)).
In various embodiments, a contract might be specified in some arbitrarily complex logic expression, which expression can be mathematically transformed into a disjunctive normal form (DNF) or into conjunctive normal form (CNF). A contract specified as a DNF expression contains any number “or” terms, any one of which, if satisfied satisfies the specification of the contract. A contract specified as a CNF expression contains any number of “and” conjunctions, such that all conjunctions must be satisfied in order to satisfy the specification of the contract. Once a contract has been normalized (i.e. into DNF or into CNF) each term can be considered a subcontract. To handle contracts in DNF (OR-ing), the techniques disclosed herein might split a contract into subcontracts (one for each term), and produce an index entry for each of the subcontracts. To support contracts in CNF (AND-ing), the techniques check to confirm that each of the subcontracts is found in the index.
As indicated in the foregoing, one application served by the construction of an efficient inverted index system related to booking and satisfying online advertisement contracts. It should be emphasized that time between an Internet user's click on a link and the display of the corresponding page—including any advertisements is a short period, desirably a fraction of a second. It is within this short time period that applicable contracts must be identified, some or all of those contracts compete for spots on the soon-to-be-displayed webpage, the winner's or winners' advertisements are selected and placed in the webpage, and finally the webpage is rendered at the user's terminal. Thus, an efficient inverted index might be efficient as measured by latency, as well as efficient with respect to computing cycles, especially when many contracts may be booked at any given moment in time.
Further, the inverted index system may receive any arbitrarily complex expressions that describe a contract. The indexing techniques disclosed herein address at least solving the lookup problem efficiently and even under conditions where the input data is complex.
A contract is a DNF expression using IN and NOT-IN predicates as the most basic predicates. An impression opportunity is a point within a multi-dimensional space where any point can be described using finite domains for each attribute along a dimension.
There are two types of basic predicates: IN predicates and NOT-IN predicates. For example, the predicate state IN {CA, NY} says that the state could either be CA or NY. The predicate state NOT-IN {CA, NY} indicates the state could be anything other than CA or NY. It is important to observe that state IN {CA, NY} is equivalent to state IN {CA}state IN {NY} (making it a disjunction of length 2) while state NOT-IN {CA, NY} is equivalent to state NOT-IN {CA}
state NOT-IN {NY} (making it a conjunction of length 2). Notice that IN and NOT-IN predicates also cover equality and non-equality predicates. Other basic predicate types might also be supported, but are not required for construction of an inverted index. Using only IN and NOT-IN, for example, ranges of integers can be supported by converting them into equality predicates using hierarchical information of integer ranges.
A contract is a DNF or CNF expression on the two basic expressions IN and NOT-IN. For example, (state IN {CA, NY}age IN {20})
(state NOT-IN {CA, NY}
interest IN {sports}) is a DNF expression using the two types of atomic expressions while (state IN {CA, NY}
age IN {20})
(interest IN {sports}) is a CNF expression. Notice that a conjunction can either be a DNF expression with one disjunct or a CNF expression with conjuncts of size 1.
A profile of an impression opportunity is a set of attribute and value pairs. For example, {state=CAage=20
interest=sports} is a profile. An impression opportunity profile is a single point in a multi-dimensional space. Hence, each attribute within the set defining the impression opportunity profile has exactly one value.
Construction of an inverted index may commence by making posting lists of contracts for each IN predicate. For each attribute name and single value pair of an IN predicate, we make one posting list. Hence, the index structure “flattens” the IN predicates when constructing the posting lists. In the embodiments described herein, the inverted index is sorted. Furthermore, each posting list might sort its contracts by contract id, and the posting lists themselves might be sorted by the ids of their current contracts. Of course other ids or keys might be used for sorting the posting lists, and/or for sorting contracts within a posting list, and such alternative ids and keys are possible and envisioned. For example, contracts might be sorted by any arbitrary key, such as customer type.
531 p.list do
Example: Consider the two contracts in Table 1. For each attribute name and possible value, Algorithm 1 constructs a posting list of contracts with flags. The final inverted index is shown in Table 2. Notice how all the IN predicates are flattened out into single values. Each posting list has its contracts sorted, and the posting lists themselves are also sorted according to the contracts they have.
state IN {CA}
state IN {NY}
In an embodiment known as The Counting Algorithm the algorithm is applied on for contract expressions in the form of conjunctions. The idea is to maintain a counter for each contract on how many predicates of the contract are satisfied. The inverted index for the conditions of the impression opportunity is scanned once. This algorithm can be considered as a baseline algorithm for performance comparison. Notice that the Counting Algorithm can support NOT-IN predicates by modifying Step 8 of Algorithm 2, namely by setting the Count value to minus infinity if the contract is tagged NOT-IN.
Example: Consider the impression opportunity I={age=1state=CA}. Given the inverted index in Table 2, the posting lists for I are shown in Table 3.
Scan through the posting lists and increment the counters for each contract. The final counts are shown in Table 4.
For each contract in Table 4, compare the count value with the number of predicates in the contract (i.e., the size of the contract). As a result, contracts c1, c3, and c4 are satisfied by I because their counts are equal to their sizes.
Complexity: The complexity of the Counting algorithm is linear to the sum of the posting list sizes of P:
O(Σk=0..|P|−1|P[k]|)
Another embodiment uses a variant of the WAND algorithm [Broder et al.] The WAND algorithm assumes a conjunction of IN predicates for contracts. Compared to the Counting algorithm, WAND makes the following improvements.
In this algorithm, contracts of size K=0 (i.e., there are no predicates), are deemed to always match. Since contracts of size K=0 do not appear in the posting lists, a separate posting list (called Z) that contains all contracts of size 0 is maintained. When K=0, Z is always returned by the idx.GetPostingLists method.
In our examples, we denote the posting lists for contracts of size K as PK. For example, the posting lists for contracts of size 2 is denoted as P2.
Example: Algorithm 3 extracts the posting lists of I from idx. This time, however, the algorithm extracts posting lists for each possible size of contracts. In Table 1, there are shown two sizes of contracts: size K=1 contains the set of contracts (c3, c4) and size K=2 contains the set of contracts (c1, c2). Hence, Table 5 shows two sets of posting lists for each size. The current contract of each posting list is underlined. Notice that in this example, the posting lists are in sorted order according to their contract IDs.
c3
c4
c1
c1 → c2
Processing continues by processing P1, that is, the posting lists of contracts with size 1. Since P1[0].Current.ID=P1[0].Current.ID=3 at Step 15, this example adds c3 to 0 in Step 16. The algorithm then skips all the posting lists to C4 because P[0].Current.ID +1=3+1=4. Hence, P1[0] reaches the end of the list while P1[1] still has c4 as its current contract. The posting lists after sorting P1 are shown in Table 6. Notice that the posting list of (age, 1) is placed at the end because it is done with processing. Since P1[0].Current.ID=P1[0].Current.ID=4 at Step 15, c4 is also accepted and included in O. After advancing the posting list P1[0], the algorithm exits the while loop in Step 13.
Next, process P2 in the second for loop. Since K is 2 and P2[0].Current.ID=P2[1].Current.ID=1, Step 16 adds c1 to O. Since NextID is 2, we advance both posting lists in P2 to c2. Notice that the posting list with key (state, CA) does not contain c2 and thus points to null, i.e., the end of the list. The posting lists after sorting P2 in Step 14 are shown in Table 7. This time, P2[0].Current=c2 while P2[1].Current=null, so go back to Step 13. Since P2[1].Current=null, terminate the while loop and return O={c1, c3, c4} as our result.
Complexity: Although WAND improves the Counting algorithm by using skipping and partitioning techniques, its complexity is actually greater than that of the Counting Algorithm. In the worst case, the WAND Algorithm needs to sort the posting list P while advancing one posting list in Step 22. Sorting in Step 14 actually takes logarithmic time to |P| because the inverted index is initially sorted, and we only need to bubble down one posting list in P using a heap to maintain a sorted order for each posting list advanced. Hence, the complexity becomes
O(log(|P|)×Σk=0..|P|−1P[k]|)
Two possible extensions of Algorithm 3 to support NOT-IN predicates are here disclosed. A simple method is to split the inverted index into a “positive inverted index,” which contains posting lists for the IN predicates, and a “negative inverted index,” which contains posting lists for the NOT-IN predicates. Although this method supports arbitrary conjunctions with NOT-IN predicates, the number of posting lists for an impression opportunity could be large if many contracts contain different NOT-IN predicates. Thus a method that does not use the negative inverted index is desired. In this latter case (the method of which is disclosed below), the inverted index size is bounded by the size of the impression opportunity, making the method practical for real-time applications.
Using One Inverted Index: Algorithm 3 might be extended to support NOT-IN predicates without using the negative inverted index. The key idea is to prune contracts whose NOT-IN predicates are violated by the impression opportunity. The motivations for the extensions become more evident in the example presented after the discussion of the algorithm.
If so, there exists a NOT-IN predicate that is violated, and thus the iteration can immediately reject P[0].Current. Notice the exploitation of the new sorting of Extension #2 to efficiently detect a NOT-IN violation. When a contract is rejected, all the posting lists that have P[0].Current as their current contracts are advanced.
Algorithm 6 shows the extended WAND algorithm. The only code change made from Algorithm 3 is the addition of Steps 18-27, which reflect Extension 3. Notice the proper support for contracts of size 0 (i.e., they have no IN predicates) because, if K=0, the algorithm always adds the posting list Z that contains all contracts of size 0. Hence, there is no case where a matching contract is missing from the posting lists.
Example: Note the contracts in Table 11. Notice that c4 is a self-contradicting contract and cannot be satisfied in any way. Also, c3 is a contract of size 0.
state NOT-IN {CA}
state NOT-IN {NY}
state NOT-IN {NY}
age NOT-IN {1}
The inverted index constructed by simulating Algorithm 6 over the set of contracts of Table 11 is shown in Table 12. Notice that c4, the self-contradicting contract, does not appear in the posting list for (age, 1).
Given an impression opportunity I={age=1state=CA }, the posting lists for I are shown in Table 13. Notice that c1, c2 have now been placed in the group of contracts of size 1 because they only have one IN predicate. Contract c3 is placed in the posting list Z because it has size=0.
Continuing, processing P0 in Algorithm 6. Since P0[0].Current.ID=P0[0].Current.ID=3 at Step 15, accept c3 and add it to O. Now start processing P1. Since P1[0].Current.ID=P1[0].Current.ID=1 at Step 15, but P1[0].Currentflag=NOT-IN, we reject c1 by advancing both the posting lists of (state, CA) and (age, 1). After sorting P1, the intermediate result is shown in Table 14.
During the next while loop, include c2 in O because P1[0].Current.ID=P1[0].Current.ID=2 and P1[0].Currentflag≠NOT-IN. Then escape the while loop at the next while condition and terminate, returning O={c2, c3} as the result.
Complexity: Unlike Algorithm 3, the sorting in Step 14 takes O(|P|log(|P|)) time because of the new sorting we use for contracts with NOT-IN tags. For example, consider the two posting lists (age, 1): c1→c2 and (state, CA): c1→c3, which are in sorted order of contract IDs. If we do not use any NOT-IN tags, then the two posting lists are still sorted even after advancing them by one contract. However, consider use of NOT-IN tags and have (age, 1): c1→c2 and (state, CA): c1l(NOT-IN)→c3. Then according to the new sorting, (state, CA) now precedes (age, 1) because c1(NOT-IN)<c1. However, this implies a re-sort of the two posting lists once they are advanced because the ordering of c2 and c3 is disrupted. Hence Step 14 needs to do an entire sort again. Even skipping the new ordering (i.e., c(NOT-IN)<c), we then need to do a O(|P″) scan in Step 18 instead of a single equality check, making the overall algorithm still have the complexity:
O(|P|log(|P|)×Σk=0..P|−1|P[k]|)
The WAND Algorithm can be further extended to support DNF expressions. The idea of Algorithm 7 is to decompose contracts into smaller contracts that have conjunctive expressions and run WAND as if they were separate contracts. After WAND terminates, then return the contracts that have any of their sub-contracts in the output O. Notice that Algorithm 7 can be easily combined with other techniques herein to support DNF expressions containing NOT-IN predicates.
Example: Consider the DNF contracts shown in Table 15 and the impression opportunity I={age=1state=CA}.
state IN {CA}
(age IN {2}
state IN {NY})
state IN {NY}
First extract the disjuncts of all contracts and form “sub-contracts” as shown in Table 16.
state IN {NY}
state IN {NY}
After running WAND, we get the satisfying sub-contracts {c11, c12 , c21}. Thus we return the contracts {c1, c2} as the final solution.
Algorithm 3 can be extended to support CNF expressions. The idea is to use the WAND algorithm on the outer conjunctions of the CNF expressions of contracts. The following extensions from Algorithm 3 are made.
The only code change in Algorithm 8 compared to Algorithm 3 is the inclusion of Steps 18-26, which reflects the Extension #6 above.
Example: Consider the contracts in Table 17. The inverted index is shown in Table 18. Notice the conjunct ID is placed after each contract, indicating which conjunct of the contract the posting list predicate is located in. For example, posting list predicate (state, CA) is located in the second conjunct of c1, and thus, add the tag “(2)” to c1. Also notice that there are two posting lists for (age, 1) because c3 has two conjunct IDs.
Given an impression opportunity I={age=1gender=F}, the posting lists for I are shown in Table 27.
(gender IN {F}
state IN {CA})
gender IN {F})
state IN {CA}
gender IN {F})
(age IN {1}
state IN {CA})
ender IN {F})
Processing P1 in Algorithm 8: Since P1[0].Current.ID=P1[0].Current.ID=4 at Step 15, start counting the number of distinct conjuncts for c4 by scanning the posting lists that have c4 as their current contracts (hence, consider both posting lists of P1). Since both posting list predicates (age, 1) and (gender, F) are in the first conjunct, |ConjunctIDSet|={1}|=1=K. Hence, accept c4 and add it to O. After processing P1, start processing P2. Since P2[0].Current.ID=P2[1].Current.ID=1 at Step 15, start counting the number of distinct conjuncts for c1. Since |ConjunctIDSet|=|{1, 2}|=2=K, add c1 to O. After advancing the two posting lists, the intermediate state of the posting lists of P2 is shown in Table 20. Since P2[0].Current.ID=P2[1].Current.ID=2 at Step 15, start counting the number of distinct conjuncts for c2. This time, however, |ConjunctIDSet|=|{1}|=1<2=K, so we reject c2. We advance the two posting lists again, arriving at Table 21. Since |ConjunctIDSet|=|{1}∪{1}∪{2}|=|{1, 2}|=2=K, ad c3 to O. Hence, return the final result O={c1, c3, c4}.
Supporting CNF Expressions with NOT-IN Predicates
Further embodiments implement two possible extensions to support CNF expression with NOT-IN predicates. As earlier indicated a simple method is to split the inverted index into positive and negative inverted indexes however, an enhanced method described below does not use the negative inverted index. The inverted index size is then bounded by the size of the impression opportunity, making the enhanced method practical for real-time applications. We explain each option in the next sections.
One important intuition to have is that, the more complex the contract expression, the more information is needed in the posting lists and the more operations are needed to perform in order to tell if the contract is really satisfied. To reduce complexity, the extensions are defined to use a minimum of information and expend a minimum of work to evaluate the contract. To reduce runtimes, some simplifications or restrictions (e.g. limiting depth of predicates within a conjunct) are applied.
Using one inverted index: One embodiment of an enhanced algorithm for CNF expressions with NOT-IN predicates uses one inverted index.
Algorithm 10 reflects the ideas above. The only code change compared to Algorithm 3 is the inclusion of Steps 18-40, which reflects the Extension #10 above.
A[P[i].
(A[P[i].Current.ID].isType2 = false
A[P[i].
Example: Consider the contracts in Table 25.
(state NOT-IN {CA}
gender NOT-IN {M})
The inverted index is shown in Table 26.
Given an impression opportunity I={age=1gender=M
state=NY}, the posting lists for I are shown in Table 27.
Processing P1 in Algorithm 10: Since P1[0].Current.ID=P1[0].Current.ID=1 at Step 15, start evaluating c1 based on the information in the posting lists. Create the array A which contains two counters for the two conjuncts of c1. Since the first posting list is an IN predicate for c1, we set A[0].Cnt to 1. Since the second posting list is a NOT-IN predicate, initialize A[1].Cnt to the quantity (−2−1)=−3 and then increment it to −2. Then accept c1 because A[0].Cnt=1 and A[1].Cnt<−1.
Suppose, on the other hand, that I2={age=1gender=M
state=CA}. Then the posting lists for I2 are shown in Table 28. In this case, A[0].Cnt=1 and A[1].Cnt=−1. The algorithm thus rejects c1 because A[1].Cnt=−1.
Suppose that I3={age=1gender=F
state=NY}. Then the posting lists for I3 are shown in Table 29. In this case, A[0].Cnt=1 and A[1].Cnt=0. Notice that A[1].Cnt=0 because none of the posting lists contain the second conjunct. Since the second conjunct is type 2, it has at least one NOT-IN predicate satisfied, thus c1 is accepted.
Finally, suppose that I4={age=2gender=F
state=NY}. Then there are no posting lists. Since A[0]=0, reject c1.
Algorithm 10 has now been extended from the original WAND algorithm 3 and now, able to build an inverted index of contracts when the set of contracts contains targets reduced to CNF expressions containing NOT-IN predicates.
The computer system 600 includes a processor 602, a main memory 604 and a static memory 606, which communicate with each other via a bus 608. The computer system 600 may further include a video display unit 610 (e.g. a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 600 also includes an alphanumeric input device 612 (e.g. a keyboard), a cursor control device 614 (e.g. a mouse), a disk drive unit 616, a signal generation device 618 (e.g. a speaker), and a network interface device 620.
The disk drive unit 616 includes a machine-readable medium 624 on which is stored a set of instructions (i.e., software) 626 embodying any one, or all, of the methodologies described above. The software 626 is also shown to reside, completely or at least partially, within the main memory 604 and/or within the processor 602. The software 626 may further be transmitted or received via the network interface device 620 over the network 130.
It is to be understood that embodiments of this invention may be used as, or to support, software programs executed upon some form of processing core (such as the CPU of a computer) or otherwise implemented or realized upon or within a machine or computer readable medium. A machine readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g. a computer). For example, a machine readable medium includes read-only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g. carrier waves, infrared signals, digital signals, etc.); or any other type of media suitable for storing or transmitting information.
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.