Advertisement search, or “ads” search, is a popular web technique that helps websites gain profits from free search and other online services. For example, search engines like MSN Search operate online advertising businesses within their search result pages. In general, advertisers pay the search engines for user clicks, whereby the more clicks that occur (that is, the greater the conversion rate of users' clicks on advertisements), the more profit that is made.
Typically, advertisements are ranked by automatic ranking algorithms similar to those used in web query searching, which generally calculate the similarities between advertisement content and user queries, search results, each advertiser's per-click payment amount, and so forth. However, heretofore such ranking algorithms have not recognized the characteristics of the advertisements themselves, and any mechanism that improves the user click rate on advertisements would be commercially valuable.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which items corresponding to online advertisements that are to be returned with a query response are ranked using reputation data. The reputation may correspond to a reputation of a product or service and/or a seller (e.g., retailer or wholesaler, or service provider).
In one implementation, advertisement items are previously processed based on relevance, which may include relevance to the search terms and/or advertiser payment. A reputation ranking mechanism ranks (or re-ranks) the advertisement items using reputation data as a factor in the ranking. For example, for each item of information corresponding to an advertisement, the ranking mechanism determines a value based on a mathematical combination of a product reputation score, a seller reputation score and a relevance score, and ranks the items according to the values. The scores may be weighted differently relative to one another in the mathematical combination.
The product (or service) and/or seller reputation data may be mined from a review source, such as customer reviews available on the web. In one example implementation, a model is used to analyze the text of the reviews to determine whether each review is more likely positive or more likely negative with respect to the reputation. One such model is a 3-gram model that considers terms in the text along with the two terms proceeding each term.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards a ranking mechanism that in part uses reputation data to select and/or rank which advertisements (e.g., a link comprising an image and/or text) to provide to users in conjunction with a query response. In general, because consumers tend to be more interested in reputable products, services and/or suppliers, the ranking mechanism described herein ordinarily increases the overall user click rate (and thus profits) generated from online advertising. Indeed, reputation may be one of the most important factors for a user that is deciding whether to click on an advertisement. Notwithstanding, as can be readily appreciated, the various aspects of the ranking mechanism are independent of any particular business or revenue model. For example, the use of reputation data in selecting and/or ranking any set of data may benefit from the aspects described herein.
Further, while as described herein the term “reputation” generally includes concepts such as user opinions about advertised products or services and/or the advertisers (e.g., retailers, wholesalers or service providers) providing the products or services, there is no requirement as to any particular source of reputation data. For example, the general public's overall reviews may be one source, a professional reviewing enterprise (or the like) an alternative or additional source, a limited group of individuals or the like (e.g., only reviewers that fit a certain demographic) yet another possible source, and so forth. Moreover, as used herein, the terms “product” and “service” are interchangeable, such as in the various examples, for purposes of simplicity.
As such, the present invention is not limited to any particular embodiments, aspects, concepts, protocols, formats, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, protocols, formats, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing and information retrieval technology in general.
Turning to
As described herein, a reputation ranking (or re-ranking) mechanism 108 processes the relevance-ranked set of advertisements 106, using reputation data 110 and/or the web 112 as part of the criteria to determine a set of reputation ranked relevant advertisements 114. Note that the reputation ranking mechanism 108 of
As represented in
To automatically rank advertisements by product (equivalent to service) reputation considerations, the technology described herein uses one or more various factors with respect to traditional content relevance ranking algorithms. Such factors include the reputation of products and/or services, and/or the reputation of sellers (e.g., retailers, wholesalers, service providers and the like). As described below, the reputation data may be predicted by mining reviews and the like that are available from various sources, such as online customer reviews.
For example, surveying product and other information before making an online transaction is a fairly popular consumer trend. Various product information portals usually provide product specifications, seller prices and customer reviews. Many users use such portals to compare specifications of similar products, to choose a particular seller based on price, to review others' comments to learn about their consumer experiences, and so forth. However, the number of products and sellers is very large, making it difficult and time-consuming for consumers to collect the necessary information.
To this end, an automatic prediction mechanism (e.g., incorporated into the reputation ranking mechanism 108) predicts product/seller reputations by mining customer reviews, such as those that are published on product information portals. The reputation data is represented as the positive review percentage, which in one example implementation is formalized as set forth herein.
More particularly, consider that the collected review set of a give product p is S(p)={r1, r2, . . . rn}. For each review r, the reputation R(r) can be either positive (POS) or negative (NEG). Typically, a review r is regarded as a series of terms, r=w1w2 . . . wk, where w represents a word; (however as used herein, the concept of a “term” includes any single entity that can be represented in a data structure, such as a word, symbol, shape and so forth, and/or any phrase comprising a plurality of such entities.) For example, “good,” “bad,” “excellent,” “defective,” and so forth are all terms that may be associated with a product review. As described below, a reputation value R(r) is made by analyzing the term series using a 3-gram model (described below) so that terms such as “no good” or “not very good” will not be misinterpreted as good.
Thus, given a query, one example implementation described herein ranks advertisements by considering each advertisement's relevance to the query and/or the payment of advertisers, as well as by analyzing reviews and the like with respect to the sellers and/or the products or services. In the example implementation, three general steps are performed, including collecting the reviews (or like data, which will be considered a “review” herein), classifying review opinions, and then using the review information to rank advertisements (or re-rank candidate advertisements previously ranked based on relevance and/or payment considerations).
To collect reviews as generally represented via step 202 of
As represented by step 204, reviewer opinion classification is next performed, which classifies reviews into positive ones and negative ones. The result is a positive review percentage of each product and seller. Note that the number of reviews can also be counted, because not all of the reviews have a rating value or the like, and the reviews from different web sites usually have different rating mechanisms. For example, there may be ten ratings at xyz.com, while there are only five ratings at abcd.com.
In this example, a-last step is to rank the advertisements, including ranking based on reputation data. For example, with the seller and product information provided by the advertisers, the relation between an advertisement and reviews can be easily established. The ranking mechanism generally analyzes the reviews' text and calculates the reputation, in terms of whether the reviews are positive or negative. For example, for a given query (q), a set of relevant advertisements 106 may be ranked (or re-ranked) into the reputation based set 114 by the following scoring function for each advertisement (ad):
Score(ad,q)=αRp(ReviewSeller(ad))+βRp(ReviewProduct(ad))+θRelevance(ad, q)
where α+β+θ=1.
As can be seen, the example scoring function above takes three factors into consideration, namely Rp(ReviewSeller(ad)), which represents the positive rate of the comments to the associated seller, Rp(ReviewProduct(ad)), which represents the positive rate of the comments to the associated product (or service), and Relevance(ad, q), which represents the relevance between the advertisement (ad) and the query q. Weighting each factor may be accomplished via the variables α, β and θ.
Turning to a consideration of mining reviews to predict product reputation, in one implementation, a 3-gram statistical approach is used. With respect to mining reviews, an online product information portal for example, is one valuable information resource that typically provides product specifications, seller price information and user comments. This information explicitly or implicitly correlates to the product reputation and quality. As can be readily appreciated, note that comments/reviews on sellers may be similarly processed, but for purposes of simplicity,
After the 3-gram model is built, given a review 308 or like data of an unrated product, an analyzer 310 then analyzes the text of the user review data (e.g., comments) for that unrated product using the 3-gram model 306. Note that the web may be crawled regarding comments on that product on demand as needed for a query, or in advance, such as in an offline reputation store building state. Step 404 locates finding one or more reviews for the product.
Step 406 represents the analysis against the 3-gram model to locate series of terms that determine (step 408) whether the review is more like the positive model or the negative model. Note that the review can be discarded or otherwise handled if, for example, the text is corrupted or otherwise nonsensical. Step 410 or 412 decreases or increases that product's reputation, respectively, as set forth above (e.g., via its positive review percentage).
in one example implementation, the 3-gram statistical approach of mining customer reviews assumes that a term (e.g., “good” or “bad”) within a reviewer's comments is related to the former two terms (e.g., “not” or “not so”), as set forth below:
P(ω1ω2ω3)=P(ω3|ω1ω2)=#(ω1ω2ω3)/#(ω1ω2)
where #(w) is the frequency of term series w. The learning process is used with training data 304 (step 402) to learn the 3-gram language model of both positive and negative comments. Both the positive comment model Mp and the negative comment model Mn comprise a set of term series representing their probabilities in the training set.
In one example implementation, to predict a comment c=w1 w2 w3 . . . wk to be positive or negative, a decision is made as to which model a comment is more alike. Given m* as the model:
Any number of new (that is, not already processed) reviews may be analyzed, as represented via step 414. The result is a prediction as to the product's reputation, shown in
In this manner, the reputation of a product and/or seller may be used as factors in determining a ranking order of advertisements to provide as part of the response to a user query. In conjunction with relevance, the click-rate on advertisements will increase.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer 510 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 510 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 510. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
The system memory 530 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 531 and random access memory (RAM) 532. A basic input/output system 533 (BIOS), containing the basic routines that help to transfer information between elements within computer 510, such as during start-up, is typically stored in ROM 531. RAM 532 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 520. By way of example, and not limitation,
The computer 510 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, described above and illustrated in
The computer 510 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 580. The remote computer 580 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 510, although only a memory storage device 581 has been illustrated in
When used in a LAN networking environment, the computer 510 is connected to the LAN 571 through a network interface or adapter 570. When used in a WAN networking environment, the computer 510 typically includes a modem 572 or other means for establishing communications over the WAN 573, such as the Internet. The modem 572, which may be internal or external, may be connected to the system bus 521 via the user input interface 560 or other appropriate mechanism. A wireless networking component 574 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 510, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
An auxiliary subsystem 599 (e.g., for auxiliary display of content) may be connected via the user interface 560 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 599 may be connected to the modem 572 and/or network interface 570 to allow communication between these systems while the main processing unit 520 is in a low power state.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.