Advertising campaign strategies are increasingly reliant on the collection of vast amounts of data regarding potential customers to determine when and where to target advertisements in order to best ensure a successful campaign. Such large data collections are often referred to simply as “big data,” which is an expression defined, for example, by the online encyclopedia Wikipedia® as “data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them.”
Due to its very volume, big data can be difficult to analyze and use effectively in shaping an advertising strategy. For example, while a consumer may be expected to align according to traditional metrics such as age group, geography, or other demographic criteria identifiable through the filtering of big data, an advertisement targeted to the consumer based on those metrics may yet be received with indifference or even hostility. However, failure to consistently target consumers with advertising that is appealing to them can undesirably reduce the anticipated return on investment (ROI) of the advertising campaign, and may even compromise the overall success of the campaign.
There are provided systems and methods for automating advertisement selection using a predictive model, substantially as shown in and/or described in connection with at least one of the figures, and as set forth more completely in the claims.
The following description contains specific information pertaining to implementations in the present disclosure. One skilled in the art will recognize that the present disclosure may be implemented in a manner different from that specifically discussed herein. The drawings in the present application and their accompanying detailed description are directed to merely exemplary implementations. Unless noted otherwise, like or corresponding elements among the figures may be indicated by like or corresponding reference numerals. Moreover, the drawings and illustrations in the present application are generally not to scale, and are not intended to correspond to actual relative dimensions.
The present application discloses systems and methods for automating advertisement selection using a predictive model that overcome the drawbacks and deficiencies in the conventional art. It is noted that, as used in the present application, the terms “automation,” “automated”, and “automating” refer to systems and processes that do not require human intervention. Although, in some implementations, a human system administrator may review or even modify advertising selections made by the systems and according to the methods described herein, that human involvement is optional. Thus, the advertisement selection described in the present application may be performed under the control of hardware processing components executing them.
It is further noted that as defined in the present application, the feature “trained predictive model” (also “machine learning model”) refers to a mathematical model for making future predictions based on patterns learned from samples of data or “training data.” Various learning algorithms can be used to map correlations between input data and output data. These correlations form the mathematical model that can be used to make future predictions on new input data.
Moreover, as defined in the present application, an artificial neural network (hereinafter “ANN”), also known simply as a neural network (NN), is a type of machine learning framework in which patterns or learned representations of observed data are processed using highly connected computational layers that map the relationship between inputs and outputs. A “deep neural network,” in the context of deep learning, may refer to a neural network that utilizes multiple hidden layers between input and output layers, which may allow for learning based on features not explicitly defined in raw data.
As further shown in
With respect to the representation of system 100 shown in
It is further noted that although
Computing platform 102 may correspond to one or more web servers, accessible over a packet-switched network such as the Internet, for example. Alternatively, computing platform 102 may correspond to one or more computer servers supporting a wide area network (WAN), a local area network (LAN), or included in another type of private or limited distribution network.
It is also noted that although user system 130 is shown as a desktop computer in
Advertising query 236 may correspond in general to first advertising query 136a and/or second advertising query 136b, in
As further shown in
As also shown in
It is further noted that the specific predictive models shown to be included among trained predictive model(s) 216 and new predictive model(s) 228 are merely exemplary, and in other implementations, trained predictive model(s) 216 and new predictive model(s) 228 may include more, or fewer, models than respective CTS+ models 216a and 228a, respective XGBoost models 216b and 228b, respective LightGBM models 216c and 228c, and respective deep ANNs 216d and 228d. Furthermore, in other implementations, trained predictive model(s) 216 and new predictive model(s) 228 may include one or more predictive models other than respective CTS+ models 216a and 228a, respective XGBoost models 216b and 228b, respective LightGBM models 216c and 228c, and respective deep ANNs 216d and 228d.
With respect to CTS+ models 216a and 228a, it is noted that those models may be an enhanced version of the CTS model known in the art. CTS+ and unenhanced CTS are tree-based models that incorporate splitting and termination rules. In general, the CTS+ tree-construction procedure includes a splitting criterion that explicitly optimizes the performance of the tree as measured on the training data. This idea is in line with the machine learning philosophy of loss minimization on the training set. CTS+ uses an ensemble of trees to mitigate the overfitting problem that commonly happens with a single tree.
CTS in its unenhanced form is described in the publication titled “Uplift Modeling with Multiple Treatments and General Response Types,” by Zhao, Fang and Simchi-Levi (see Zhao Y., X. Fang and D. Simchi-Levi, SIAM Data Mining 2017), which is hereby incorporated fully by reference into the present application. In the publication by Zhao et al. incorporated by reference herein, the performance of unenhanced CTS was tested on three benchmark data sets. The first was a 50-dimensional synthetic data set. The latter two were randomized experimental data. According to Zhao et al., on all of the data sets, unenhanced CTS demonstrated superior performance compared to other applicable methods, such as Separate Model Approach with Random Forest/Support Vector, Regression/K-Nearest Neighbors/AdaBoost, and Uplift Random Forest (upliftRF) as implemented in the R uplift package.
By contrast to unenhanced CTS, the enhanced version CTS+ introduced in the present application incorporates use of a weighted impurity function as a component for scoring of candidate advertisements 250 identified by trained CTS+ model 216a. In exemplary implementations, trained CTS+ model 216a including the new weighted impurity function may be implemented in a convenient computer language, such as Python for example, and may be embedded into a machine learning package, such as a scikit-learn package.
Referring to exemplary XGBoost models 216b and 228b, it is noted that XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. XGBoost implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. Documentation describing XGBoost is accessible online at https://xgboostreadthedocsio/en/latest/, and that documentation is hereby incorporated fully by reference into the present application.
Referring to exemplary LightGBM models 216c and 228c, LightGBM is a gradient boosting framework that uses tree based learning algorithms. LightGBM is designed to be distributed and efficient with the following advantages: faster training speed and higher efficiency, lower memory usage, improved accuracy, parallel and GPU learning supported, and capable of handling large-scale data. Documentation describing LightGBM is accessible online at https://lightgbm.readthedocsio/en/latest/, and that documentation is hereby incorporated fully by reference into the present application.
Regarding deep ANNs 216d and 228d, it is noted that such neural network models may be developed utilizing a PyTorch package, which is an optimized tensor library for deep learning using GPUs and CPUs. Documentation describing PyTorch is accessible online at https://pytorch.org/docs/stable/index.html, and that documentation is hereby incorporated fully by reference into the present application.
Software code 210 corresponds in general to software code 110, and those corresponding features may share any of the characteristics attributed to either corresponding feature by the present disclosure. Thus, like software code 210, software code 110 may include predictive model(s) 216, as well as features corresponding respectively to parameter extraction module 212, parameter abstraction module 222, scoring module 218, advertisement selection module 220, training module 226, and new predictive model(s) 228. Moreover, like software code 210, software code 110 may include parameters 214 extracted from advertising query 236, abstracted parameters 224, candidate advertisements 250 identified using trained predictive model(s) 216, and desirability scores 252 determined for each of candidate advertisements 250 using scoring module 218.
The functionality of software code 110/210 will be further described by reference to
Referring now to
Flowchart 360 continues with identifying, using trained predictive model(s) 216, multiple candidate advertisements 250 for target consumer group 144 based on parameters 214 describing target consumer group 144 (action 362). Hardware processor 104 may execute software code 110/210 to utilize parameter extraction module 212 to extract raw parameters 214 described above from first advertising query 136a/236. In some implementations, action 362 may be performed using trained predictive model(s) 216 based solely on parameters 214 extracted from first advertising query 136a/236. However, in other implementations, action 362 may be performed based on abstracted parameters 224 in addition to, or instead of, parameters 214.
Abstracted parameters 224 may be generated using parameter abstraction module 222 of software code 110/210 based on parameters 214. For example, parameter abstraction module 222 may be implemented using an ANN trained using labeled or unlabeled data to infer or “abstract” parameters not expressly included in first advertising query 136a/236. Thus, in some implementations, parameter abstraction module 222 may receive parameters 214 from parameter extraction module 212 as inputs, and may provide abstracted parameters 224 based on parameters 214 as outputs to trained predictive model(s) 216. It is emphasized that, in various implementations, the identification of multiple candidate advertisements 250 in action 362 using trained predictive model(s) 216 may be based on parameters 214 alone, may be based on abstracted parameters 224 alone, or may be based on a combination of parameters 214 and abstracted parameters 224.
As discussed above, trained predictive model(s) may include one or more predictive models. Moreover, those trained predictive models may be used sequentially, in parallel, or selectively. That is to say, in some implementations, multiple predictive models of trained predictive model(s) 216 may be used in action 362, while in other implementations, as few as one of trained predictive model(s) 216 may be used in action 362.
In some implementations, trained predictive model(s) 216 used in action 362 may include a tree-based model that incorporates splitting and termination rules modified by a weighted impurity function, such as CTS+ model 216a. In other words, such a tree-based model may be a CTS model utilizing the weighted impurity function discussed above. In some implementations, trained predictive model(s) 216 used in action 362 may include a substantially optimized distributed gradient boosting library model providing a parallel tree boosting, such as exemplary XGBoost model 216b.
In addition, or alternatively, in some implementations, trained predictive model(s) 216 used in action 362 may include a gradient boosting framework using tree-based learning algorithms, such as exemplary LightGBM model 216c. Moreover, in some implementations, trained predictive model(s) 216 used in action 362 may include deep ANN 216d. Identification of candidate advertisements 250 for target consumer group 144 using trained predictive model(s) 216 may be performed by software code 110/210, executed by hardware processor 104.
Flowchart 360 continues with determining, using scoring module 218, desirability scores 252 for each one of candidate advertisements 250. Each of desirability scores 252 corresponds to the likelihood that the respective one of candidate advertisements 250 (based on which the desirability score is determined) will be enticing to target consumer group 144 (action 363). In other words, each of desirability scores 252 corresponds to the likelihood that the respective one of candidate advertisements 250 will contribute positively to brand lift. Scoring module 218 may be configured to determine desirability scores 252 using a scoring algorithm. For example, in some implementations, scoring module 218 may be configured to determine desirability scores 252 using a scoring algorithm in the form of a cumulative distribution function (CDF), as described in greater detail below. Moreover, in some implementations, desirability scores 252, once determined using scoring module 218, may be utilized to update the CDF or other scoring algorithm. Determination of desirability scores 252 for each one of candidate advertisements 250 using scoring module 218 may be performed by software code 110/210, executed by hardware processor 104.
A CDF is a group statistics that traditionally requires the use of all available data points for its calculation. This requirement that the data used in determining CDF be substantially comprehensive can be extremely burdensome when using a traditional CDF algorithm to analyze an advertising campaign lasting weeks or months. As a result, in some implementations, it may be advantageous or desirable to utilize a fast CDF algorithm (hereinafter “fast CDF”) to approximate quantiles. An example of fast CDF is described in the publication titled “A Fast Algorithm for Approximate Quantiles in High Speed Data Streams,” by Qi Zhang and Wei Wang, (International Conference on Scientific and Statistical Database Management 2007), which is hereby incorporated fully by reference into the present application.
CDF, whether implemented using a traditional CDF algorithm or fast CDF, is utilized herein to calculate the relative scores, i.e., user quality metrics compared with other in-target audience of the same advertising campaigns. Predictions output by trained predictive model(s) 216 are the absolute user quality scores, and those absolute user quality scores are used to calculate CDF for each individual advertising campaign. Each user's absolute user quality score is compared against her/his in-target campaign's CDF to determine the relative user quality score (i.e., approximate quantile), and that relative user quality score is used to determine whether a particular user is admitted to a target consumer group 144 and served an advertisement.
Flowchart 360 can conclude with selecting one of candidate advertisements 250 based on desirability scores 252 for distribution to target consumer group 144 (action 364). For example, in one implementation, the one of candidate advertisements 250 having the highest desirability score 252 determined in action 363 may be selected for distribution to target consumer group 144. Selection of the one of candidate advertisements 250 for distribution to target consumer group 144 based on desirability scores 252 may be performed by software code 110/210, executed by hardware processor 104, and using advertisement selection module 220.
Action 364 results in first advertisement selection 138a/238 being provided as an output by software code 110/210 in response to receiving first advertising query 136a/236 as an input in action 361. As noted above, in some implementations, system 100 may utilize software code to output first advertisement selection 138a/238 in real-time with respect to receiving first advertising query 136a/236 from user 134, such as within less than or equal to one minute, for example, within less than or equal to 500 milliseconds of receiving first advertising query 136a/236.
As shown in
Referring to
Alternatively, or in addition, a more comprehensive survey may be utilized for consumer rating 146/246. For example, such a survey may take the form of a full-blown questionnaire with multiple questions (e.g., dozens of questions) that was prompted by API calls to a third party survey vendors, or may be a series of predetermined questions distributed to target consumer group 144. Consumer rating 146/246 measures brand lift metrics, which are the ultimate campaign performance reports later provided to advertising clients. Distribution of these more comprehensive surveys may be delayed such that consumer rating 146/246 collected in this way may be obtained well after one of target consumer group 144 finishes viewing an advertisement. For example, distribution of a more comprehensive survey may occur as soon as a few minutes to as late as a few weeks after one of target consumer group 144 finishes viewing an advertisement.
Obtaining consumer rating 146/246 in action 365 can be a large volume, low cost indicator of effectiveness of advertisement 142 usable in real-time. Additionally, there may be a strong correlation between consumer rating 146/246 and consumer perception of a brand identified with advertisement 142. Consumer rating 146/246 of advertisement 142 may be obtained from at least some of target consumer group 144 by software code 110/210, executed by hardware processor 104, via communication network 108 and network communication links 118.
As noted above, in some implementations, scoring module 218 of software code 110/210 may utilize a scoring algorithm in the form of a CDF to determine desirability scores 252 for candidate advertisements 250 identified by trained predictive model(s) 216. In some of those implementations, flowchart 360 may further include periodically updating the CDF or other scoring algorithm used, based on consumer rating 146/246 of the selected one of candidate advertisements 250 distributed as advertisement 142 (action 366). For example, in various implementations, a CDF or other scoring algorithm used by scoring module 218 may be updated daily, i.e., every twenty-four hours, every other day, twice daily, or using any other time interval, based on consumer ratings 146/246 obtained since the previous update of the CDF or other scoring algorithm. Action 366 may be performed by software code 110/210, executed by hardware processor 104.
In addition to action 366, or as an alternative action, consumer rating 146/246 obtained in action 365 may be used to train new predictive model(s) 228 (action 367). New predictive model(s) 228 may be trained based using consumer rating 146/246 in a manner analogous to that used for initial training of trained predictive model(s) 216. It is noted that consumer rating 146/246 obtained from pulse surveys and consumer rating 146/246 obtained from more comprehensive questionnaire-type surveys may be used for new model training. Training of new predictive model(s) 228 may be performed by software code 110/210, executed by hardware processor 104, and using training module 226.
In implementations in which new predictive model(s) 228 is/are trained based on consumer rating 146/246, flowchart 360 may continue with comparing the advertisement selection performance of new predictive model(s) 228 to the advertisement selection performance of trained predictive model(s) 216 (action 368). It is also noted that consumer rating 146/246 obtained from pulse surveys and consumer rating 146/246 obtained from more comprehensive questionnaire-type surveys may be used for comparison of the advertisement selection performance of new predictive model(s) 228 to that of trained predictive model(s) 216. Moreover, in those implementations, flowchart 360 may conclude with replacing trained predictive model(s) 216 with new predictive model(s) 228 when the advertisement selection performance of new predictive model(s) 228 exceeds the advertisement selection performance of trained predictive model(s) 216, or exceeds a predetermined threshold (action 369).
Actions 368 and 369 may be performed by software code 110/210, executed by hardware processor 104, and using scoring module 218. By way of example, in some implementations, actions 368 and 369 may be performed periodically, such as daily, weekly, or monthly, to progressively improve the automated advertisement selection performance of system 100. It is noted that when trained predictive model(s) 216 are replaced by new predictive model(s) 228, the scoring algorithm used by scoring module 218 in conjunction with trained predictive model(s) 216 must also be updated or replaced with a scoring algorithm, such as a CDF, optimized for new predictive model(s) 228.
Although not included in the outline provided by flowchart 360, in some implementations, hardware processor 104 may execute software code 110/210 to identify, based on desirability scores 252 for each of candidate advertisements 250, a best predictive model for target consumer group 144 from among trained predictive model(s) 216 or new predictive model(s) 228. In those implementations, upon receiving second advertising query 136b/236 corresponding to target consumer group 144, hardware processor may execute software code to use the identified best predictive model of trained predictive model(s) 216 or new predictive model(s) 228 and scoring module 218 to output second advertisement selection 138b/238 identifying another advertisement for distribution to target consumer group 144. That is to say, where one of trained predictive model(s) 216 or new predictive model(s) 228 is identified as the best predictive model for target consumer group 144, that predictive model may be used as the default predictive model for selecting advertisements for target consumer group 144 until a still better predictive model is identified.
With respect to advertisement 142 identified by first advertisement selection 138a/238, as noted above, advertisement 142 may be transmitted via communication network 108 to user system 130 including display 132. Although not included in the outline provided by flowchart 360, in some implementations in which advertisement 142 is distributed to user system 130, the present method can include rendering advertisement 142 on display 132 of user system 130. As further noted above, display 132 may be implemented as an LCD, LED display, or an OLED display, for example.
In some implementations, user system 130 including display 132 may be integrated with system 100 such that display 132 may be controlled by hardware processor 104 of computing platform 102. In other implementations, software code 110/210 may be stored on a computer-readable non-transitory medium, as discussed above by reference to
Thus, the present application discloses systems and methods for automating advertisement selection using a predictive model that overcome the drawbacks and deficiencies in the conventional art. Moreover, in some implementations, the automated solutions disclosed by the present application can advantageously provide one or more advertisement selections to a user in real-time with respect to receiving an advertisement query from the user.
From the above description it is manifest that various techniques can be used for implementing the concepts described in the present application without departing from the scope of those concepts. Moreover, while the concepts have been described with specific reference to certain implementations, a person of ordinary skill in the art would recognize that changes can be made in form and detail without departing from the scope of those concepts. As such, the described implementations are to be considered in all respects as illustrative and not restrictive. It should also be understood that the present application is not limited to the particular implementations described herein, but many rearrangements, modifications, and substitutions are possible without departing from the scope of the present disclosure.
The present application claims the benefit of and priority to a pending Provisional Patent Application Ser. No. 62/755,347, filed Nov. 2, 2018, and titled “Methods and Systems for Advertisement Serving Decision Utilizing Online Scoring for Brand Lift Measurement,” which is hereby incorporated fully by reference into the present application.
Number | Date | Country | |
---|---|---|---|
62755347 | Nov 2018 | US |