Matrix Factorization based collaborative filtering approaches are widely used in modern recommender systems. One challenge with this approach is when the collaborative information is very sparse (cold start) and in many cases not available (very cold start scenario). This affects recommendation performance. To address this issue, several methods have been proposed that add content or contextual information as auxiliary information with the collaborative information to show performance improvements. How best to represent the content and contextual information and combine it with the collaborative information has been an ongoing field of investigation.
Therefore, there is a need for an improved framework that addresses the above mentioned challenges.
Described is a system, method, and computer-implemented apparatus for recommending items, such as audio, video, documents, web pages, profiles, or other types of media or content. Embodiments are inspired by the observation that users enjoy content because of an emotional connection with it. As such, in one embodiment, emotions evoked by experiencing content are used in combination with implicit user feedback to produce a rank ordering of items. For example, for a set of users and a set of videos, a matrix indicating which users have ‘liked’ which videos and a vector of emotions evoked by each video is used to generate a list of ranked recommendations for one or more of the set of users.
In another embodiment, emotions evoked by experiencing content in combination with implicit user feedback may be used to predict user emotional response to a different piece of content. In yet another embodiment, implicit feedback received from users (e.g., ‘likes’) are a supervisory signal used to improve emotion recognition models. These models can in turn create better recommendations by more accurately identifying emotions contained in content.
These embodiments and more are based on the intuition that latent factors learned from factorizing user-item interaction data carries information that captures the emotive factors of content that makes a user ‘like’ the item.
With these and other advantages and features that will become hereinafter apparent, further information may be obtained by reference to the following detailed description and appended claims, and to the figures attached hereto.
Some embodiments are illustrated in the accompanying figures, in which like reference numerals designate like parts, and wherein:
In the following description, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the present frameworks and methods and in order to meet statutory written description, enablement, and best-mode requirements. However, it will be apparent to one skilled in the art that the present frameworks and methods may be practiced without the specific exemplary details. In other instances, well-known features are omitted or simplified to clarify the description of the exemplary implementations of the present framework and methods, and to thereby better explain the present framework and methods. Furthermore, for ease of understanding, certain method steps are delineated as separate steps; however, these separately delineated steps should not be construed as necessarily order dependent in their performance.
The content recommendation system 102 can be any type of computing device capable of responding to and executing instructions in a defined manner, such as a workstation, a server, a portable laptop computer, another portable device, a touch-based tablet, a smart phone, a mini-computer, a mainframe computer, a storage system, a dedicated digital appliance, a device, a component, other equipment, or a combination of these. The system may include a central processing unit (CPU) 104, an input/output (I/O) unit 106, a memory module 120 and a communications card or device 108 (e.g., modem and/or network adapter) for exchanging data with a network (e.g., local area network (LAN) or a wide area network (WAN)). It should be appreciated that the different components and sub-components of the system may be located on different machines or systems. Memory module 120 may include cold start matrix factorization recommendation module 110.
Cold start matrix factorization recommendation module 110 includes logic for receiving and processing a user-item implicit feedback matrix and an emotional-response vector for the purpose of ranking items for a user. Throughout this disclosure, reference is made to “items”. Items contain content. Examples of types of items include audio, video, documents, web pages, profiles, or other media. An “item” refers to the object itself, whereas “content” refers to the material contained within the item. Examples of types of content include sound, motion picture, text, hypertext, social media content, and the like.
Reciprocal rank equation 202 calculates a reciprocal rank (RR) for user i. Yij denotes the binary relevance score of item j to user i. For example, if item j is relevant to user i else 0. Rij denotes the rank of item j in the ranked list (in descending order) of that item for user i. 1(x) is an indicator function that is equal to 1 if x is true, otherwise 0.
The claimed embodiments generate latent factor representations Ui and Vj for user i and item j, respectively, where Ui is a user vector and Vj is an item vector, and Ui and Vj are factors that when multiplied together result in an M by N relevance matrix. In this context, equation 204 depicts the relationship between item vector Vj, transformation matrix T, and content representation vector Cj, where Vj is an N-dimensional vector, transformation matrix T is an N by N matrix, and Cj is an N-dimensional vector. In one embodiment, each element of vector Cj represents an emotion evoked by content item j. In this way, emotion representations are relatable to the item vector.
Objective function 206 illustrates a function in terms of user vector U and transformation matrix T. In this embodiment, the function g(fij) approximates, and has been substituted for, the 1/Rij term of reciprocal rank equation 202. Similarly, the function g(fik−fij) where fij=UiTVj approximates and is substituted for the 1(Rik<Rij) term of reciprocal rank equation 202. As such, λ is the regularization coefficient, which is an experimentally derived constant for the purpose of ensuring no overfitting. ∥U∥2F is the Frobenius norm of U and ∥T∥ is the Frobenius form of T.
Stochastic gradient ascend is used to maximize the objective function 206. Function 208 depicts one embodiment of a partial derivative of the objective function 206 for user i with respect to Ui. Function 208 is one of two functions used to iteratively calculate user vector U and transformation matrix T by taking partial derivative of the objective function 206.
The other function used to iteratively calculate user vector U and transformation matrix T is function 210, which illustrates the partial derivative of objective function 206, for every user i and item j, with respect to transformation matrix T. Em,n is the elementary matrix of order (m×n), g′(x) is the derivative of g(x) and the circle with an ‘x’ inside is the outer product.
Line 1 initializes user matrix U and transformation matrix T with random values. In one embodiment these are real numbers, i.e., floating point numbers.
Line 2 initializes a counter t to 0. The counter will be incremented at line 8 and tested against itermax at line 9. Line 3 indicates the beginning of a repeated block, lines 4-9.
Line 4 begins the block by executing in a loop, for i equal to 1 to M, the block from lines 5-7. In this block from lines 5-7, line 5 depicts updating the user matrix U by adding the previous value to equation 208 multiplied by a learning rate y, where equation 208 includes the partial derivative of objective function 206 with respect to user vector Ui. The purpose of the learning rate γ is to ensure convergence. Line 6 depicts an inner loop, from j equals 1 to N, executing line 7, which updates transformation matrix T by adding the result of equation 210 multiplied by the learning rate γ, where equation 210 represents a partial derivative of the objective function 206 with respect to the transformation matrix T.
At line 10, the final latent factor matrices U and T are returned as output.
In one embodiment, the implicit feedback includes a unary indication that a user ‘liked’ an item—e.g., the user click on the ‘Like’ button on Facebook®. User-item combinations for which no ‘like’ has been recorded are undefined. However, binary, ternary, or any other representation of implicit feedback is similarly contemplated. In some embodiments, some or all items may be ‘undefined’ for a given user. It is an object of the invention to generate a ranked recommendation of items for a user, regardless of how many (including none) of the items the user has implicit feedback for.
At block 404 an N dimensional vector of content-based estimations of emotional responses to items is received. Emotion modeling refers to the problem of automatically estimating the expected emotional response that a content will receive from users. Many techniques have been used for emotion modeling, any and all of which are contemplated for the claimed embodiments. In one embodiment, automatically estimating an expected emotional response includes representing each item by a feature representation that carries emotional information. Affective features, sentibank features, hybrid convolutional neural networks (CNN) features, and the like are examples of feature representations that carry emotional information. In the case of video, these features are based on visual content of the item (e.g. not a text description of the item, review of the item, etc.). In one embodiment, emotion categories, also called labels, are given to each item based on the feature representation. Videos may be labeled with one or more emotional categories, such as Amusement, Anger, Disgust, Fear, Interest, Joy, Sadness, Surprise, and Tension. In one embodiment, real numbers are used to value the emotions, but integers, whole numbers, tuples, or any other representation is similarly contemplated. In one embodiment, the real numbers are derived from a syntactic structure of the corresponding item. Other types of items are similarly capable of being associated with emotion labels. While emotion labels are predicted from pixels of an image or video, they may be predicted from characters or words from text, an audio waveform from an audio item, etc.
At block 406, an M by N matrix of relevance scores is generated based on the N dimensional vector of estimated emotional responses and the received M by N matrix of implicit feedback. In one embodiment, an M by K user matrix and a K by N item matrix are generated, such that when multiplied together, produce an M by N matrix of relevance scores. In this way, the M by K user matrix and the K by N item matrix are latent factor representations of the M by N user-item relevance matrix.
Typically, the M by N matrix of relevance scores contains a value for each combination of user and item (i.e., is dense), in contrast to the M by N matrix of implicit feedback, which may be sparse. Relevance scores may be binary values, integers, or any other representation. Relevance scores are distinct from implicit feedback in that implicit feedback represents some user actions, typically not explicit rankings, whereas relevance scores are one result of the claimed embodiments.
In one embodiment, the K by N item matrix is generated indirectly by generating an N by N transformation matrix, that, when multiplied by the N dimensional vector of estimated emotional responses, generates the K by N item matrix. One example of this is equation 204 of
In one embodiment, the M by K user matrix and the N by N transformation matrix are generated by random seeding and iterative updating based on an objective function, such as function 206 of
At block 408, a ranked list of items for one of the M users is generated. In one embodiment, the ranking is based on the reciprocal rank function 202 as described above with regard to
At block 410, the process 400 ends.