This document relates to information processing.
Advertisers can run advertisement campaigns in any of multiple different platforms, including the Internet, television, radio, and billboards. Advertisements used in advertising campaigns can cover a range of products and services and can be directed toward specific audiences or more generally toward the greater population. For example, publishers operating websites can provide space to advertisers for presenting advertisements. Advertisements presented on a website are sometimes selected based on the content of the website.
The invention relates to associating an entity with a category.
In a first aspect, a computer-implemented method for associating an entity with a category includes determining a probability value for each of at least a subset of a plurality of categories, the probability value representing a likelihood that an identified entity belongs to the respective category and determined using information about the entity. The method includes recording one of the plurality of categories for the entity, the category identified using the probability value and a rule set for the plurality of categories.
Implementations can include any, all or none of the following features. The entity can be a content provider identified as enrolled in a program in which the content provider provides content to be published by at least one publisher, and the probability value can be determined using at least one keyword associated with the content provider and at least one financial value associated with the content provider. Determining the probability value can include mapping the at least one keyword at least to the subset of the plurality of categories; weighting at least the subset with the at least one financial value, wherein the financial value has been assigned to the corresponding keyword; and selecting a predetermined number of the categories as the subset. The rule set can be based on training data. The rule set can include a decision tree configured for selecting one of the plurality of categories by processing at least some of a plurality of decisions included in the decision tree. The method can further include generating the decision tree using the training data, wherein the training data comprises mappings of entities to one or more of the plurality of categories. Generating the decision tree can further include weighting the mappings using financial data regarding the entities. Weighting the mappings can further include oversampling at least a subset of the mappings based on the financial data corresponding to the subset of the mappings. Generating the decision tree can include selecting a structure for the decision tree; determining an extent of the decision tree, including how many of the plurality of decisions to be made before the one of the plurality of categories is selected; and determining threshold values to be used in the plurality of decisions. The decision tree can be generated iteratively. The content provider can be engaged in advertising and the plurality of categories can include verticals with which the content provider is to be matched. Generating the decision tree can further include identifying at least one of the verticals for which the determination of the probability values has a tendency to improperly assign the vertical to the content provider; and selecting at least one of the threshold values so that the tendency is reduced. The method can further include presenting information to a user based on the category having been identified for the entity. The information can indicate a seasonality associated with the category.
In a second aspect, a computer system includes a first classifier determining a probability value for each category of at least a subset of a plurality of categories, the probability value representing a likelihood that an identified entity belongs to the respective category and determined using information about the entity. The system includes a second classifier identifying one of the plurality of categories for the entity using the probability value and a rule set for the plurality of categories.
Implementations can include any, all or none of the following features. The rule set can be based on training data. The first classifier can take into account a financial value relating to the entity in determining the probability value. The rule set can include a decision tree configured for selecting one of the plurality of categories by processing at least some of a plurality of decisions included in the decision tree, and the computer system can further include a rule component generating the decision tree using the training data, wherein the training data comprises mappings of entities to one or more of the plurality of categories. The rule component can weight the mappings using financial data regarding the entities, including oversampling at least a subset of the mappings based on the financial data corresponding to the subset of the mappings. The system can further include a front end component presenting information to a user based on the second classifier having identified the category for the entity.
In a third aspect, a computer-implemented method for associating a content provider with a category includes identifying a content provider as enrolled in a program in which the content provider provides content to be published by at least one publisher. The method further includes receiving at least one keyword regarding the content provider and at least one financial value regarding the keyword. The method further includes receiving a plurality of categories, wherein the content provider is to be associated with at least one of the categories. The method further includes mapping the at least one keyword to a subset of the categories based on names of the categories. The method further includes associating each of at least the subset of the categories with a probability value representing a likelihood that the content provider should be associated with the respective category, the probability values weighted using the financial value. The method further includes receiving a rule set generated regarding the plurality of categories, the rule set configured for use in identifying one of the categories. The method further includes processing data regarding the content provider using the rule set, the data including at least: (i) the probability value for each of at least the subset of the categories (ii) financial data regarding the content provider; (iii) a geographic region with which the content provider is associated. The method further includes selecting one of the plurality of categories for the content provider based on the processing of the data. The method further includes associating the content provider with the selected category.
Implementations can provide any, all or none of the following advantages. Improved classification into categories can be provided. A probability-based classification can be revenue-weighted and can be made further specific by a rule-based classification previously trained using training data. Flexibility in classification can be increased.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.
In some implementations, one or more entities in the system 100 can be involved in a transaction in which a content provider provides content to be published by at least one publisher. For example, content such as an advertisement can be distributed from the content provider system 102 over the network 106 for publication on behalf of one or more of the content publisher systems 104. In some implementations, the content can temporarily or permanently be held by a third party, such as a content distributor system 108 (e.g., an advertisement server) and can be distributed from the system 108 for publication. For example, when a user system 110 requests media content (e.g., a web page) from the publisher system 104, the content distributor system 108 can provide associated content (e.g., an advertisement) to the user system 110 for presentation in connection with the requested content. Below will be described examples in which one or more entities, such as a content provider and/or a content publisher in the system 100, can be classified using a catalog of categories. Such classification can be useful to anyone involved with the classified entity, for example a person who manages distribution of content between entities.
The system 100 can include one or more classifiers. In some implementations, the system 100 includes a probability classifier 112 and a rule based classifier 114. Names for these and other components are here used broadly, rather than narrowly; for example, the probability classifier 112 can use one or more rules in its operation, and the rule based classifier 114 can determine or use one or more probabilities in the classification process. The classifiers 112 and 114 can be implemented in any form, such as using software, hardware, firmware, or combinations thereof.
In some implementations, the classifiers 112 and 114 can be used in an effort to match a selected entity, such as the content provider operating the system 102, with one or more categories, such as verticals from a verticals catalog 116. A vertical can refer to one or more business classifications, such as the categorization terms sometimes used in marketing analysis to represent businesses and customers that trade in a common field (e.g., a consumer electronics vertical, or a cosmetics vertical). Other classifications can be used.
The probability classifier 112 can determine, for an entity such as a content provider, a probability value for at least one of the verticals in the catalog 116. The probability can represent a likelihood that the content provider belongs to the corresponding vertical. For example, the probability classifier can determine a probability that an entity “Example Company, Inc.” should be classified as belonging to a “mortgage” vertical. The probability can be determined using information about the entity. In some implementations, the probability classifier 112 can determine multiple probability values, such as a value corresponding to each of at least a subset of the verticals in the catalog 116.
The rule based classifier 114 can identify a category, such as one of the verticals in the catalog 116, for the entity. In some implementations, the rule based classifier 114 can use one or more probabilities determined by the probability classifier 112 and a rule set such as a decision tree 118. For example, the decision tree 118 can include a plurality of decisions and can be configured for selecting one of the plurality of verticals in the catalog 116 by processing at least some of the decisions. In some implementations, the system 100 can include a rule component 120 that generates the decision tree 118 or other rules based on training data 122. In some implementations, the training data 122 can include mappings of entities to respective ones of the categories, such as the verticals in the catalog 116.
A rule set such as the decision tree 118 can be generated in any of multiple ways. In some implementations, a model of the tree can be defined and the tree can then be generated based on the training data 122. For example, a structure of the tree can be selected, such as to define that the tree should include multiple levels of binary decisions. As another example, an extent of the tree can be defined (e.g., when should the decision tree end), such as how many of the plurality of decisions are to be made before the one of the plurality of categories is selected. In some implementations, one or more decisions in the tree 118 can use a threshold value. For example, a probability (e.g., one determined by the probability classifier 112) can be compared against the threshold value. One or more aspects of the decision tree 118 can be generated using any kind of iterative process. For example, a structure of the tree 118 can be chosen in an initial iteration and tested against representative data, such as the training data 122, and results of such testing can be used to generate another structure of the tree 118 in another iteration. As another example, a first set of threshold values can be determined in an initial iteration, and at least one of the values can be refined through a feedback process in one or more additional iterations.
The rule based classifier 114 can serve one or more purposes in the system 100. In some implementations, the probability classifier 112 can have a tendency to mis-classify entities in one or more regards. For example, the classifier 114 might frequently choose an “entertainment” vertical for entities that are in fact not involved, or involved only to a small degree, in the entertainment industry. Such characteristics in the probability determination can be artifacts of how the probability classifier 112 is configured and can depend on a number of factors, which can make it difficult or impractical to resolve the problem. In some implementations, the rule based classifier 114 can be used in combination with the probability classifier 112. For example, at least one of the threshold values in the rule set (e.g., the decision tree 118) used by the rule based classifier 114 can be selected so as to reduce or eliminate the tendency with regard to the category at issue.
At least one category (e.g., one of the verticals in the catalog 116) can be selected for a given entity, such as for the content provider operating the system 102. Such a selection can be used for one or more purposes, such as to output relevant information to a user. In some implementations, the system 100 can include a front end component 124 that can use one or more category selections. For example, the front end component 124 can present information relating to the selected category or categories as a way of characterizing the entity.
The system 200 can include a base classifier 206. In some implementations, the base classifier can be configured to classify an entity, such as a content provider or a content distribution campaign, using a set of categories, such as the verticals catalog 116 (
The base classifier 206 can map multiple keywords for a particular entity to respective verticals. The respective verticals chosen for the keywords can be merged (e.g., their respective probabilities can be averaged) to form a single categorization for the entity. In some implementations, the verticals chosen for the entity can be weighted based on the financial data 204, such as based on the amounts spent on individual keywords. For example, verticals for keywords that account for a relatively large fraction of the content provider's or distribution campaign's spending can be given a relatively larger weight in computing the classification. In some implementations, the base classifier 206 can include the probability classifier 112 (
The system 200 can include a spend-weighted rule component 210. In some implementations, the component 210 can provide a policy for defining a primary one among several categories, such as among three revenue weighted verticals. For example, the component 210 can run as an offline program with regard to other components in the system 200, such as in form of a program in the MATLAB environment developed by The Mathworks company.
The spend-weighted rule component 210 can be configured for a multi-class classification on a multidimensional feature space. In some implementations, n dimensions of features can be used for mapping to any of m dimensions. For example, the verticals catalog 116 can include 30 verticals. As another example, additional features can be identified including, but not limited to, quarterly spend of the entity, total spend of the entity, number of keywords for the entity, and billing country of the entity. Thus, a 34-dimentional feature space (i.e, n=34) can be used for a classification into any of 30 dimensions (i.e, m=30). In some implementations, one or more of the feature dimensions, such as the entity country, can be categorical. For example, a predetermined number of top countries (e.g., nine countries) can be assigned one class each, and remaining countries can be grouped in a common class. In some implementations, one or more of the feature dimensions can be a discrete or a continuous variable. For example, a key word count can be a discrete variable and/or total spend can be a continuous variable.
In some implementations, the spend-weighted rule component 210 can include the rule based classifier (
The spend-weighted rule component 210 can output a rule set 212 that can be used in selecting the category for the entity. In some implementations, the rule set can include a decision tree. For example, the component 210 can split and grow a decision tree to optimize the determined probability that the given entity is a member of a particular category. As another example, the training data 122 (
In some implementations, a feature such as “Classification and Regression Trees” (CART) can be used. In such implementations, the spend-weighted rule component 210 can include or be based on a CART classifier. For example, CART models can be constructed with a customized pruning procedure (e.g., a stopping rule). As another example, error estimations of the CART model can be calculated using 10-fold cross validation.
In some implementations, the rule set 212 includes a classification decision tree of one-dimensional rules for mapping a set of (e.g., three) revenue weighted verticals into one vertical for the entity. For example, this can provide the benefit of greater generalization capability in the system 200, such as to allow pruning of “bad verticals” and/or other systemic errors from the base classifier 206.
In generating the rule set 212, financial data can be taken into account. In some implementations, data can be replicated when a CART model is constructed, such as to proportionate the amount of replication with the spent amount(s). For example, data corresponding to a relatively high total and/or quarterly spend level can be oversampled. As another example, data corresponding to a relatively low total and/or quarterly spend level can be undersampled. In some implementations, additional training data points based on revenue can tend to bias the final output (e.g., the selection of one or more categories) to high-spending entities (e.g., content providers) and improve accuracy regarding these entities.
An example of the rule set 212, here a decision tree, is presented below in Appendix I.
The system 100 can include a primary vertical classifier 214. In some implementations, the classifier can statically map a set of revenue-weighted categories (e.g., the weighted verticals 208) into a single primary vertical for the entity. For example, the classifier 214 can use the rule set 212 (such as by loading a CART classification tree generated by the component 210) to select one of the weighted categories from the base classifier 206.
Step 410 includes determining a probability value for each of at least a subset of a plurality of categories. The probability value can represent a likelihood that an identified entity belongs to the respective category and can be determined using information about the entity. For example, the probability classifier 112 and/or the base classifier can generate the weighted verticals 208 for a particular entity such as a content provider or a content publisher. The subset can include one or more categories.
Step 420 includes recording one of the plurality of categories for the entity, the category identified using the probability value and a rule set for the plurality of categories that is based on, for example, training data. For example, the rule based classifier 114 and/or the primary vertical classifier 214 can select one vertical from the catalog 116 to be associated with the particular entity.
Step 430 includes presenting information based on the identification of a category for the entity. For example, the front end component 124 can generate the user interface 300 that can present the seasonality area 306
The memory 520 stores information within the system 500. In one implementation, the memory 520 is a computer-readable medium. In one implementation, the memory 520 is a volatile memory unit. In another implementation, the memory 520 is a non-volatile memory unit.
The storage device 530 is capable of providing mass storage for the system 500. In one implementation, the storage device 530 is a computer-readable medium. In various different implementations, the storage device 530 may be a floppy disk device, a hard disk device, an optical disk device, or a tape device.
The input/output device 540 provides input/output operations for the system 500. In one implementation, the input/output device 540 includes a keyboard and/or pointing device. In another implementation, the input/output device 540 includes a display unit for displaying graphical user interfaces.
The features described can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The apparatus can be implemented in a computer program product tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by a programmable processor; and method steps can be performed by a programmable processor executing a program of instructions to perform functions of the described implementations by operating on input data and generating output. The described features can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. A computer program is a set of instructions that can be used, directly or indirectly, in a computer to perform a certain activity or bring about a certain result. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
Suitable processors for the execution of a program of instructions include, by way of example, both general and special purpose microprocessors, and the sole processor or one of multiple processors of any kind of computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memories for storing instructions and data. Generally, a computer will also include, or be operatively coupled to communicate with, one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
To provide for interaction with a user, the features can be implemented on a computer having a display device such as a CRT (cathode ray tube) or LCD (liquid crystal display) monitor for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer.
The features can be implemented in a computer system that includes a back-end component, such as a data server, or that includes a middleware component, such as an application server or an Internet server, or that includes a front-end component, such as a client computer having a graphical user interface or an Internet browser, or any combination of them. The components of the system can be connected by any form or medium of digital data communication such as a communication network. Examples of communication networks include, e.g., a LAN, a WAN, and the computers and networks forming the Internet.
The computer system can include clients and servers. A client and server are generally remote from each other and typically interact through a network, such as the described one. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
A number of embodiments have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of this disclosure. Accordingly, other embodiments are within the scope of the following claims.
This application claims priority under 35 USC §119(e) to U.S. Provisional Patent Application Ser. No. 61/097,026, filed on Sep. 15, 2008, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61097026 | Sep 2008 | US |