The present teaching generally relates to data analytics. More specifically, the present teaching relates to data analytics in content recommendation.
With the development of the Internet and the ubiquitous network connections, more and more commercial and social activities are conducted online. Networked content is served to millions, some requested and some recommended. Platform operators that make electronic content available to users may leverage their online presence to solicit advertisements (ads) to be displayed together with content to users. For each of such ad display opportunity, mechanisms may be put in place where advertisers may bid for the opportunity to display their ads, which may be evaluated on-the-fly with respect to, e.g., estimated performance of each bid ad so that a winning ad may be selected based on the estimated performance. In an online ad auction, a winning ad may be recommended and such a process of recommending a winning ad is to match supply with demand.
This is illustrated in
Due to the on-the-fly nature associated with online advertising, the computations that enables online ad auction is generally very large scale with a certain required level of latency (low). Field-weighted factorization machines (FwFM) are commonly used in such large scale and low latency recommender systems. However, to save computational time to meet the low latency constraints, field interactions in such systems are often pruned, producing sub-optimal results, making it difficult to control recommendation quality. Thus, there is a need for a solution that can enhance the performance of the traditional approaches.
The teachings disclosed herein relate to methods, systems, and programming for information management. More particularly, the present teaching relates to methods, systems, and programming related to online advertising.
In one example, a method, implemented on a machine having at least one processor, storage, and a communication platform capable of connecting to a network for online advertising. A diagonal vector d is determined based on supply and demand data identified from ad auction related data. A predicted performance (P-P) metric is computed based on the diagonal vector d via low rank field weighted factorization machines (FwFM) for each of candidate ads included in the ad auction related data. The candidate ads are ranked based on their corresponding P-P metrics. A winning ad is selected from the ranked candidate ads according to a predetermined selection criterion.
In a different example, a system is disclosed for online advertising that includes a low rank field-weighted factorization machine (FwFM) predicted performance (P-P) metric determiner, a P-P metric based ad ranking unit, and a winning ad selection unit, The low rank FwFM P-P metric determiner is provided for processing ad auction related data to identify supply data and demand data, determining a diagonal vector d based on the supply data and the demand data, and computing a predicted performance metric via low rank FwFM for each of a plurality of candidate ads included in the ad auction related data based on the diagonal vector d. The P-P metric based ad ranking unit is provided for ranking the plurality of candidate ads based on their corresponding predicted performance metrics. The winning ad selection unit is provided for selecting one of the ranked plurality of candidate ads as a winning ad according to a predetermined selection criterion.
Other concepts relate to software for implementing the present teaching. A software product, in accordance with this concept, includes at least one machine-readable non-transitory medium and information carried by the medium. The information carried by the medium may be executable program code data, parameters in association with the executable program code, and/or information related to a user, a request, content, or other additional information.
Another example is a machine-readable, non-transitory and tangible medium having information recorded thereon for online advertising. The information, when read by the machine, causes the machine to perform the following steps. A diagonal vector d is determined based on supply and demand data identified from ad auction related data. A predicted performance (P-P) metric is computed based on the diagonal vector d via low rank field weighted factorization machines (FwFM) for each of candidate ads included in the ad auction related data. The candidate ads are ranked based on their corresponding P-P metrics. A winning ad is selected from the ranked candidate ads according to a predetermined selection criterion.
Additional advantages and novel features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The advantages of the present teachings may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.
The methods, systems and/or programming described herein are further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
In the following detailed description, numerous specific details are set forth by way of examples in order to facilitate a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or system have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
The present teaching discloses an alternative low rank FwFM approach that reduces the rank of the field-interaction matrix rather than pruning small field-interaction weights as in the conventional solutions. The low rank FwFM approach according to the present teaching allows faster scoring of predicted performance metrics even with multi-value fields with a much-simplified computation construct and process, yet maintains the accuracy enjoyed when deploying FwFM. The low rank FwFM as disclosed herein achieves integration of multi-value features in prediction more effectively and, thus, better supports online advertising with real-time performance-based ad ranking and recommendation.
The P-P metric as computed for each of the candidate ads may be defined according to application needs.
As discussed herein, the present teaching relates to an alternative low rank FwFM formulation to enable the computation of P-P metrics with a reduced rank of the field-interaction matrix rather than pruning small field-interaction weights as in the conventional solutions. Prior to presenting the exemplary detail formulation of the low rank FwFM,
First, consistent notations are defined in order to define the low rank FwFM. Vectors are herein denoted by lowercase boldface letters, such as x, a, and matrices are defined by uppercase boldface letters, such as P. Vectors defined herein correspond to column vectors. Components of a vector are denoted by indexed lower-case letters, e.g., components of x are expressed as x1, . . . , xn. Similarly, components of a matrix P are denoted by pi,j. Rows of a matrix P∈m×n are denoted by P1, . . . , Pm.
The standard inner product between two vectors x, y∈n is denoted as:
and the Euclidean squared norm is denoted by ∥x∥2=x,x
). For a given vector x, the notation diag (x) represents a square diagonal matrix with the components of x on its diagonal. For a given square matrix P, the term keepdiag (P) denotes a diagonal matrix whose diagonal is identical to P. For example,
The trace of A, denoted by tr(A), is the sum of the elements on the diagonal of A. It is well-known that the trace is invariant under circular shifts of matrix products, namely,
The standard inner product between two matrices A, B∈m×n can be written as:
A conventional factorization machine in its fast algorithm version is formulated as follows. A FwFM receives an input x∈m, and whose learned parameters are the scalar w0, the vector w∈
m, and the vectors v1, . . . , vm ∈Rk, which computes:
The above naive formula has computational complexity of O(m2k). Its faster mathematically equivalent corresponds to the following formula:
The complexity of this faster formulation is O(mk).
Since factorization machines are generally trained on a table with columns representing features such as age, gender, . . . , etc., each component of x originates from some column of the table. These columns are termed fields. In Equation (FM), all features in the input vector x are treated uniformly, i.e., vectors vi, vj in the inner products xivi,xjvj
are the same vectors, regardless whether the inner product represents the interaction between “age” and “gender”, or the interaction between “age” and “ad category.” Each component of xi is associated with some field f1, . . . , fm∈{1, . . . , n} representing one of the n columns of the table. A field-weighted factorization machine, or FwFM, may have the same input and parameters as a regular factorization machine, but with additional symmetric field interaction matrix Rn×n as a learnable parameter. A FwFM then computes the following:
As can be seen from this formulation, the computational complexity for FwFM is O(m2k).
Generally, factorization machines in an ad recommendation system are employed on categorical columns, e.g., columns representing “gender” or “ad category.” When numeric feature columns are present, such as the number of ad views in the last two weeks, or the time since the user last visited our site, such numerical columns are transformed to categorical columns via, e.g., quantization. Categorical columns may then be, in turn, encoded as in a one-hot encoding manner. The vector x may be formed through a concatenation of the one-hot encodings of each column in the table, and therefore contains only zeroes and ones. There is typically a unique “1” corresponding to each column in the table, and therefore the pairwise interactions can be written in terms of the fields, rather than the components of x. That is, with respect to FMV: it can be written as:
While with respect to FwFM, it is written as:
Ads are typically recommended to a given user visiting a given site at a given time via an ad auction process. As discussed herein, an auction framework may rank candidate ads according to their respective predicted performance metrics, computed with a help of some trained prediction model, such as a factorization machine or a field-weighted factorization machine. In such a setting, some of the fields may represent the user, the site, and the time of visit. Other fields may represent a candidate ad for which a P-P metric needs to be computed. As discussed herein, in some embodiments, input fields {1, . . . , n} may be partitioned into two sets: supply information and demand information, with S denoting the supply fields (describing the user and the site) and D the demand fields (describing the candidate ads). During an ad auction, all supply fields remain constant—all candidate ads correspond to the same user visiting the same site. The demand fields, however, change for every candidate ad which are likely different.
Based on the equation on (FMV), the term inside the ½(·) parentheses can be de-composed as:
so that the sums over S can be computed only once per auction as they are the same for every candidate ad considered. Given that, for a regular factorization machine, the computational complexity for each candidate ad is O(|D|k). In most auction systems, the number of supply fields |S| may be practically irrelevant as to the complexity especially when there are sufficiently more candidate ads. Given that field-weighted factorization machines FwFMs consider full interactions among different columns, although they have more representation power and are more accurate, they are slower to compute due to complexity so that FwFM in conventional treatment is less appropriate for real-time large scale ad ranking where tens of thousands of ads need to be ranked in a matter of milliseconds.
It is in this context that the present teaching is developed to attain the representation power of FwFM in ad ranking yet derive a computationally feasible scheme suitable for real-time large scale ad ranking. A diagonalized form of a matrix A is given by a matrix U and a vector e as:
Any symmetric matrix can be written in such a diagonalized form according to spectral decomposition theorem. In addition, an extension of the above corresponds to a diagonal plus diagonalized decomposition of a matrix A, given the matrix U and the vectors e, d,
It can be seen that the regular diagonalized form is merely a special case of the above, when, i.e., d is a zero vector. Therefore, any symmetric matrix can also be written in the above form. Such decompositions may be particularly useful when matrix U∈c×n has a small number of rows c. This is referred to as “low rank plus diagonal” (LRPD) decompositions. Occasionally, although a given symmetric matrix A may not be exactly represented for a given small c, it can be approximated reasonably well. As c increases, the approximation quality continually improves until c=n, corresponding to the exact representation corresponding to the spectral decomposition theorem.
The aim is to provide a faster computational formula for the FwFM under the assumption of one-hot encoding and reduces the computational cost from O(n2k) to O(c|D|k), where D corresponds to the “demand” fields, and c is a constant. The formulation according to the present teaching comprises of two components: (a) an alternative matrix form of writing the field-weighted factorization machine formula (FwFMV), and (b) a diagonal-plus-low-rank approximation of the field interaction matrix R. Embedding the field vectors v1, . . . , vn into the rows of the matrix V and assume that ri,i=0. The pairwise inner products vi,vj
correspond to the components of the matrix Q=VVT. That is, qi,j=
vi,vj
. With the assumption of ri,i=0, by definition, we can write the pairwise sum (FwFMV) as:
As discussed herein, learning directly the field interaction matrix R is computational expensive. Instead, as R is symmetric, a diagonal plus low rank approximation may be employed. That is, a matrix U∈c×n, for some c, and a vector e∈
c may be instead learned as formulated below:
As the matrix R has a zero diagonal, the diagonal part of R's decomposition corresponds to—keepdiag (UT diag(e)U). Denoting the diagonal of UT diag(e)U by d, it can then be rewritten as:
Substituting it into the matrix form in Equation (TR), the following is obtained:
Thus, the pairwise interactions
may be computed via following steps:
where step (1) can be computed, as discussed herein, only once upon the model being trained so that it does not affect the computational complexity. It does not depend on any specific user or ad features.
As discussed herein, each row of V corresponds to a field, and thus each column of U corresponds to a field. Given the supply fields S and the demand fields D, matrix V may be split along its rows into two matrices VD and VS. Correspondingly, U may also be split along the columns into UD and US. Because P=UV, we can equivalently compute:
Therefore, the above computational steps become:
That is, the training procedure is now modified according to the present teaching to learn an approximation of R with U and e. Utilizing this modified formulation, the learned decomposition may be used to construct an alternative fast scoring approach for candidate ads, which is equivalent to a FwFM with an approximated matrix R.
To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform(s) for one or more of the elements described herein. The hardware elements, operating systems and programming languages of such computers are conventional in nature, and it is presumed that those skilled in the art are adequately familiar therewith to adapt those technologies to appropriate settings as described herein. A computer with user interface elements may be used to implement a personal computer (PC) or other type of workstation or terminal device, although a computer may also act as a server if appropriately programmed. It is believed that those skilled in the art are familiar with the structure, programming, and general operation of such computer equipment and as a result the drawings should be self-explanatory.
Computer 700, for example, includes COM ports 750 connected to and from a network connected thereto to facilitate data communications. Computer 700 also includes a central processing unit (CPU) 720, in the form of one or more processors, for executing program instructions. The exemplary computer platform includes an internal communication bus 710, program storage and data storage of different forms (e.g., disk 770, read only memory (ROM) 730, or random-access memory (RAM) 740), for various data files to be processed and/or communicated by computer 700, as well as possibly program instructions to be executed by CPU 720. Computer 700 also includes an I/O component 760, supporting input/output flows between the computer and other components therein such as user interface elements 780. Computer 700 may also receive programming and data via network communications.
Hence, aspects of the methods of information analytics and management and/or other processes, as outlined above, may be embodied in programming. Program aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data that is carried on or embodied in a type of machine-readable medium. Tangible non-transitory “storage” type media include any or all of the memory or other storage for the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide storage at any time for the software programming.
All or portions of the software may at times be communicated through a network such as the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, in connection with information analytics and management. Thus, another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine-readable medium may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, which may be used to implement the system or any of its components as shown in the drawings. Volatile storage media include dynamic memory, such as a main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that form a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a physical processor for execution.
Those skilled in the art will recognize that the present teachings are amenable to a variety of modifications and/or enhancements. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server. In addition, the techniques as disclosed herein may be implemented as a firmware, firmware/software combination, firmware/hardware combination, or a hardware/firmware/software combination.
While the foregoing has described what are considered to constitute the present teachings and/or other examples, it is understood that various modifications may be made thereto and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.