The invention relates generally to computer systems, and more particularly to an improved system and method for detecting the sensitivity of web page content for serving advertisements in online advertising.
Operators of websites offering online content may manage an inventory of advertisements that may be shown to visitors viewing content of a website. When a user may visit a website, the operator of the website or a third party may choose to show one or more advertisements to the user with the expectation that the user may select an advertisement to buy advertised goods or services. Advertisers may bid to have their advertisement shown to a visitor viewing particular content of the website. Or the operator of the website or third party may choose the advertisement and may generate revenue whenever a visitor may select an advertisement shown while viewing content of the website.
Most current approaches for choosing advertisements that match the content of a requested web page only consider how well the advertisements match the topic of the content of the web page. Although advertisements with topics related to the subject matter of a web page may be relevant, choosing an advertisement solely on the basis that the topic matches a topic of a web page fails to consider whether the advertisement is appropriate for the context of the document. Such an approach may also fail to consider whether the opinions and sentiments expressed in a web page may be appropriate for specific advertisements with related topics. For example, placing advertisements for display with a web page displaying news about a war or a disaster may be considered inappropriate and even offensive. Similarly, placing an advertisement for a company or product along with a web page article expressing negative opinions about the company or product may also be inappropriate. Neither the advertiser nor the user will find the advertisement appropriate. As another example, placing adult or sexually-oriented advertisements in unrelated content pages will also be considered both inappropriate and offensive. As the online publishing and advertisement industry grows, there needs to be better optimization in matching advertisements to web pages to reflect the context of the web page content.
What is needed is a way to recognize the context of web page content beyond simply its topic in order to reduce serving inappropriate advertisements. Such a system and method should improve the user experience and increase revenue for advertisers and website operators.
Briefly, the present invention provides a system and method for detecting the sensitivity of web page content for serving advertisements in online advertising. A web page sensitivity classifier may be provided for identifying the sensitivity of the content of a web page to an advertisement. In particular, a statistical classifier, for instance, may be trained using pairs of a web page and an advertisement, each represented by features. In addition, the training data may also include a classification of the sensitivity of the web page to the advertisement for each pair. The features of each pair of a web page and advertisement may be used to train a statistical classifier to identify the sensitivity of an unseen web page to an unseen advertisement. Any type of statistical classifier may be used including a support vector machine, a naïve Bayes classifier, or other type of statistical classifier.
In an embodiment, an advertisement serving engine may be provided for serving one or more advertisements for display with content of a web page. In general, a list of candidate advertisements may be received for display with the web page, and advertisements from the list of candidate advertisements may be identified that do not match the sensitivity of the content of the web page. The web page sensitivity classifier may use the features of each web page and the advertisement to classify the sensitivity of the content of the web page to each of the advertisements. Advertisements identified from the list of candidate advertisements that do not match the sensitivity of the content of the web page may be removed. Web page placements may be allocated for advertisements from the list of candidate advertisements that match the sensitivity of the content of the web page, and the advertisements may be served for display in the allocated web page placements.
The present invention may support many applications for detecting the sensitivity of web page content for serving advertisements in online advertising. For example, online content publishing applications may use the present invention to select a list of advertisements that match the sensitivity of the content of a web page for display with content requested by a user. Similarly, ecommerce applications may use the present invention to select a list of advertisements that match the sensitivity of the product information requested by a user. Or online search advertising applications may use the present invention to identify and remove advertisements that do not match the sensitivity of the content of search results from a list of candidate advertisements predicted to be relevant for display with search results to a user. For any of these online applications, the sensitivity of web page content may be detected by the present invention for serving advertisements in online advertising.
Other advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer system 100 may include a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer system 100 and includes both volatile and nonvolatile media. For example, computer-readable media may include volatile and nonvolatile computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer system 100. Communication media may include computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. For instance, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
The system memory 104 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 106 and random access memory (RAM) 110. A basic input/output system 108 (BIOS), containing the basic routines that help to transfer information between elements within computer system 100, such as during start-up, is typically stored in ROM 106. Additionally, RAM 110 may contain operating system 112, application programs 114, other executable code 116 and program data 118. RAM 110 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by CPU 102.
The computer system 100 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
The computer system 100 may operate in a networked environment using a network 136 to one or more remote computers, such as a remote computer 146. The remote computer 146 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer system 100. The network 136 depicted in
The present invention is generally directed towards a system and method for detecting the sensitivity of web page content for serving advertisements in online advertising. The web page sensitivity classifier may use the features of a web page and the features of each advertisement in a list of candidate advertisements to identify advertisements that do not match the sensitivity of the content of the web page. Any advertisements that do not match the sensitivity of the content of the web page may be removed form the list of candidate advertisements. Web page placements may be allocated for advertisements from the list of candidate advertisements that match the sensitivity of the content of the web page, and the advertisements may be served for display.
As will be seen, applications that may display advertisements to users who visit a web site, including managed content properties, may use the present invention to serve advertisements that may not only be relevant but also appropriately match the sensitivity of the context of the content requested by a user. As will be understood, the various block diagrams, flow charts and scenarios described herein are only examples, and there are many other scenarios to which the present invention will apply.
Turning to
In various embodiments, a client computer 202 may be operably coupled to one or more servers 208 by a network 206. The client computer 202 may be a computer such as computer system 100 of
The server 208 may be any type of computer system or computing device such as computer system 100 of
The server 208 may be operably coupled to computer-readable storage media such as storage 214 that may store any type of advertisements 216 and web pages 218 that may be represented by a set of features 220. In an embodiment, an advertisement 216 may be displayed according to a web page placement 224. An advertisement ID 222 associated with an advertisement 216 may be allocated to a web page placement 224 may include a Uniform Resource Locator (URL) 228 for a web page and a position 230 for displaying an advertisement on the web page. In various embodiments, a web page may be any information that may be addressable by a URL, including a document, an image, audio, and so forth.
There may be many applications which may use the present invention for detecting the sensitivity of web page content for serving advertisements in online advertising. For example, online content publishing applications may use the present invention to select a list of advertisements that match the sensitivity of the content of a web page for display with content requested by a user. Similarly, ecommerce applications may use the present invention to select a list of advertisements that match the sensitivity of the product information requested by a user. Or online search advertising applications may use the present invention to identify and remove advertisements that do not match the sensitivity of the content of search results from a list of candidate advertisements predicted to be relevant for display with search results to a user. For any of these online applications, the sensitivity of web page content may be detected by the present invention for serving advertisements in online advertising.
In an embodiment, a statistical classifier may be trained for binary classification of the sensitivity of the content of a web page to an advertisement. There may be a training corpus of training pairs, each pair representing a web page and an advertisement. Each web page may be represented by features and labeled to indicate whether the content of the web page may be sensitive to the advertisement. The features of a web page may include text represented as a dimensional vector of words, the topic of the web page, domain information and/or clustering features generated from unlabeled web pages. Each advertisement may similarly be represented by features including text represented as a dimensional vector of words, topic of the advertisement, clustering features generated from unlabeled advertisements, and so forth.
The features of each pair of a web page and advertisement may be used to train a statistical classifier to identify the sensitivity of an unseen web page to an unseen advertisement. The statistical classifier may be a support vector machine, a naïve Bayes classifier, or other type of statistical classifier. Those skilled in the art will appreciate that other methods may be used for binary classification including collective inference.
At step 306, a statistical classifier may be trained using the pairs and the classification of the sensitivity of the content of the web page to the advertisement in each pair. For instance, a statistical classifier may apply naïve Bayesian techniques using for example the frequency of different text appearing in the content of the web page and the advertisement in an embodiment to learn the probability that the web page is sensitive to an advertisement. Or a Support Vector Machine (SVM) may be employed in another embodiment to automatically learn classification of the sensitivity of a web page to an advertisement from examples. Consider i to represent an index for pairs of a web page and an advertisement 1 . . . n, and j to represent an index for features 1 . . . d for each pair of a web page and advertisement. A training set {(xi,yi)}1≦i≦n may be given, where xi≡(xi1 . . . xid)T is the d dimensional vector representation of the i-th example and yi is its label where yi=1 or yi=−1. For example, label, yi, may be assigned a value of 1 if the sensitivity of the pair of a web page and advertisement matches; otherwise, yi may be assigned a value of −1. A linear classier may use a d dimensional weight vector, w, with the classification function defined by f(x)=w·x. Consider w2 to denote the square of the Euclidean norm of w. The SVM may minimize the following objective function:
where l may represent a loss function, l(t)=max(0,1−t)p. Commonly used values for p are: p=1 and p=2. Advantageously, fast methods exist to train SVMs.
Once a classifier may be trained to detect the sensitivity of web page content to an advertisement, the classifier may be applied to identify the sensitivity of an unseen web page to an unseen advertisement.
The ability to classify the sensitivity of the context of content of a web page to an advertisement may improve the quality of a general advertisement serving system, and correspondingly increase revenue for advertisers and website operators.
At step 504, a list of candidate advertisements to display with the web page may be received. For an online publishing advertising application, a list of candidate advertisements selecting by relevance of matching content may be received. Or for a sponsored search advertising application, the list of candidate advertisements may be selected by a keyword auction. In any case, advertisements from the list of candidate advertisements may be identified at step 506 that do not match the sensitivity of the content of the web page. In one embodiment, the sensitivity of the content of a web page to an advertisement may be identified by classification of the pair of the web page and the advertisement using the steps described in conjunction with
At step 508, advertisements identified from the list of candidate advertisements that do not match the sensitivity of the content of the web page may be removed. A step 510, web page placements may be allocated for the list of candidate advertisements that match the sensitivity of the content of the web page. For an online publishing advertising application, web page placements may be allocated for displaying advertisements along with the content requested. Or for a sponsored search advertising application, web page placements may be allocated for the sponsored search area of the search results page displayed to a user. At step 512, the list of advertisements that match the sensitivity of the content of the web page may be served for display in the allocated web page placements.
Thus the present invention may be used by applications that may display advertisements to users who visit a website, including managed content properties, to serve advertisements that may not only be relevant but also appropriately match the sensitivity of the context of the content requested by a user. Advantageously, a classifier may be trained using a combination of features, including terms, a topic, and clustering features, that may provide the ability to discriminate web pages that are sensitive to an advertisement from those that are not sensitive, without requiring the creation of a taxonomy designed specifically for this task. As a result, not only may the effort of annotation be significantly reduced, but the ability to discriminate may not be restricted by the limitations imposed in the design of the taxonomy. Moreover, the features derived in classifying the sensitivity of the content of a web page to an advertisement may be used for ranking the relevance of the advertisement even if the web page may not be classified as sensitive to the advertisement. Thus, the present invention may also be used to improve the quality of advertisement ranking in online advertising applications.
As can be seen from the foregoing detailed description, the present invention provides an improved system and method for detecting the sensitivity of web page content for serving advertisements in online advertising. The system and method may use the features of a web page and the features of each advertisement in a list of candidate advertisements to identify advertisements that do not match the sensitivity of the content of the web page. Web page placements may be allocated for advertisements from the list of candidate advertisements that match the sensitivity of the content of the web page, and the advertisements may be served for display. For online content publishing applications, the present invention may be used to select a list of advertisements that match the sensitivity of the context of content of a web page for display with content requested by a user. Similarly, ecommerce applications may use the present invention to select a list of advertisements that match the sensitivity of the product information requested by a user. Or online search advertising applications may use the present invention to identify and remove sponsored advertisements that do not match the sensitivity of the content of search results from a list of candidate advertisements predicted to be relevant for display with search results to a user. Accordingly, the system and method provide significant advantages and benefits needed in contemporary computing and in online applications.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.