This disclosure relates generally to the processing of graph based data using machine learning techniques, including processing bipartite graph data.
Personalized recommendation plays an important role in many online services. Accurate personalized recommendation systems can benefit users as well as content publishers and platform providers. As a result, recommender systems have attracted great interest in both academia and industry. A core method behind recommender systems is collaborative filtering (CF). A common method for collaborative filtering is matrix factorization (MF). MF models characterize both items and users by vectors in the same space, inferred from the observed entries of the user-item historical interaction.
Deep learning models have been introduced in various applications recently, boosting the performance significantly compared to traditional models. However, deep learning methods are not sufficient to yield optimal user/item embeddings due of the lack of explicit encoding of the latent collaborative signal from user-item interactions and the reliance on the explicit feedback from users that are relatively sparse.
Therefore, researchers have turned to the emerging field of graph convolutional neural networks (GCNNs), and applied GCNNs for recommendation by modeling the user-item interaction as a bipartite graph. A number of recent works focus on using GCNNs to learn user and item representations for recommender systems. GCNN's are used to model the user-item interaction history as a bipartite graph and treat each user and each item as a respective node in the graph. The vector representation of a node is learned by iteratively combining the embedding of the node itself with the embeddings of the nodes in its local neighborhood. An embedding is a mapping of a discrete variable to a vector of continuous numbers. In the context of neural networks, embeddings are low-dimensional, learned continuous vector representations of discrete variables. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.
Most existing methods split the process of learning a vector representation of a node into two steps: neighborhood aggregation, in which an aggregation function operating over sets of vectors to aggregate the embeddings of neighbors, and center-neighbor combination that combines the aggregated neighborhood vector with the central node embedding. These methods learn node embeddings on graphs in a convolution manner by representing a node as a function of its surrounding neighborhood, which is similar to the receptive field of a center-surround convolutional kernel in computer vision.
Despite their effectiveness, existing GCNN bipartite graph solutions have at least two limitations. First, existing solutions ignore the intrinsic difference between the two types of nodes in the bipartite graph (users and items), and thus the heterogeneous nature of the bipartite graph has not been fully considered. Second, even if most GCNN-based recommendation models exploit the user-item relationship during embedding construction, they ignore the user-user and item-item relationships, which are also very important signals.
Accordingly there is a need for a GCNN-based recommender system that that is able take advantage of the intrinsic difference between node types of a graph (e.g., users and items) and the relationships between same-type nodes (e.g., user-user relationships, item-item relationships).
In example embodiments, a method and system is provided that may extract information from a bipartite graph data that includes two types of nodes. First, the similarity between nodes for a first type and nodes of a second type may be captured by modeling the historical interaction as a bipartite graph. Graph network generated embeddings hu, hv are derived through graph convolution over the bipartite graph representing user-item interaction. Second, similarities among nodes of the first type and nodes of the second type are identified by constructing first node type-first node type and second node type-second node type graphs. Multi-graph embeddings are derived from proximal information extracted from these same node-type graphs. In some examples, skip connection embeddings are generated to enable learning from individual node characteristics by re-emphasizing initial features.
In some applications, the method and system can be implemented to enable a recommender system that takes advantage of the intrinsic difference between node types of a graph (e.g., users and items) and the relationships between same-type nodes (e.g., user-user relationships, item-item relationships). In at least some applications, this may enable more accurate recommendations thereby improving system efficiency.
According to a first aspect is a computer implemented method for processing a bipartite graph that comprises a plurality of first nodes of a first node type, and a plurality of nodes of a second type. The method includes: generating a target first node embedding for a target first node based on features of second nodes and first nodes that are within a multi-hop first node neighbourhood of the target first node, the target first node being selected from the plurality of first nodes of the first node type; generating a target second node embedding for a target second node based on features of first nodes and second nodes that are within a multi-hop second node neighbourhood of the target second node, the target second node being selected from the plurality of second nodes of the second node type; and determining a relationship between the target first node and the target second node based on the target first node embedding and the target second node embedding.
In at least some example embodiments of the preceding aspect generating the target first node embedding comprises: for each of a plurality of second nodes included within the first node neighbourhood (i) aggregating features for first nodes that are direct neighbours of the second node within the first node neighbourhood, and mapping the aggregated features to a respective second node embedding; (ii) aggregating the second node embeddings for the plurality of second nodes included within the first node neighbourhood; and (iii) mapping the aggregated second node embeddings to generate the target first node embedding. Generating the target second node embedding comprises: for each of a plurality of first nodes included within the second node neighbourhood: (i) aggregating features for second nodes that are direct neighbours of the first node within the second node neighbourhood, and mapping the aggregated features to a respective first node embedding; (ii) aggregating the first node embeddings for the plurality of first nodes included within the second node neighbourhood; and (iii) mapping the aggregated first node embeddings to generate the target second node embedding.
In at least some examples, each aggregating and mapping is performed using a respective function that is defined by a respective set of learnable parameters, wherein the aggregating and mapping is performed iteratively in respect of the target first node and the target second node and the learnable parameters updated to optimize an objective function calculated based on the target first node embedding and the target second node embedding.
In at least some examples of the preceding aspects, the functions are implemented within a graphic convolution network (GCN) and the respective sets of learnable parameters are weight matrices.
In at least some examples of the preceding aspects, the method includes defining the first node neighbourhood of the target first node by randomly sampling the bipartite graph to: select a second node subset from second nodes that are direct neighbours of the target first node, and select respective subsets of first nodes from first nodes that are direct neighbours of each of the second nodes of the second node subset; and defining the second node neighbourhood of the target second node by randomly sampling the bipartite graph to: select a first node subset from first nodes that are direct neighbours of the target second node, and select respective subsets of second nodes from second nodes that are direct neighbours of each of the first nodes of the first node subset.
In at least some examples of the preceding aspects, respective predefined hyper-parameters define respective sizes of the second node subset, the respective subsets of first nodes, the first node subset and the respective subsets of second nodes.
In at least some examples of the preceding aspects, the method includes determining, based on the bipartite graph, first node to first node relationship information and constructing a first node graph that includes first nodes, including the target first node, from the bipartite graph and the first node to first node relationship information; generating a first node-first node embedding for the target first node based on the first node graph; determining, based on the bipartite graph, second node to second node relationship information and constructing a second node graph that includes second nodes, including the target second node, from the bipartite graph and the second node to second node relationship information; generating a second node-second node embedding for the target second node based on the second node graph; wherein the relationship between the target first node and the target second node is determined also based on the first node-first node embedding and the second node-second node embedding.
In at least some examples, determining the first node to first node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the first nodes based on calculating pairwise cosine similarities between the respective pairs of the first nodes, and determining the second node to second node relationship information comprises determining the presence or absence of a direct neighbor relationship between respective pairs of the second nodes based on calculating pairwise cosine similarities between the respective pairs of second nodes.
In at least some examples, generating the first node-first node embedding for the target first node comprises using a first node-first node aggregating function having learnable parameters to aggregate features of the first nodes that are direct neighbours of the target first node in the first node graph, and generating the second node-second node embedding for the target second node comprises using a second-node-second node aggregating function having learnable parameters to aggregate features of the second nodes that are direct neighbours of the target second node in the second node graph.
In at least some examples of the preceding aspects, the method includes generating, using a first skip connection transformation function having learnable parameters, a target first node skip connection embedding based on an initial target first node embedding; and generating, using a second skip connection transformation function having learnable parameters, a target second node skip connection embedding based on an initial target second node embedding, wherein the relationship between the target first node and the target second node is determined also based on the target first node skip connection and the target second node skip connection.
In at least some examples, the method includes determining a first node embedding by fusing the target first node embedding, the first node-first node embedding and the first node skip connection embedding; determining a second node embedding by fusing the target second node embedding, the second node-second node embedding and the second node skip connection embedding; the relationship between the target first node and the target second node being determined based on the first node embedding and the second node embedding.
In at least some examples, the first nodes represent users and the second nodes represent items, the bipartite graph includes historical user-item interaction data, the method further comprising determining an item recommendation for a user represented by the target node based on the determined relationship between the target first node and the target second node.
According to a further example aspect is a graph convolution network (GCN) for processing a bipartite graph that comprises a plurality of first nodes of a first node type, and a plurality of second nodes of a second type, the CGN being configured to: generate a target first node embedding for a target first node based on features of second nodes and first nodes that are within a multi-hop first node neighbourhood of the target first node, the target first node being selected from the plurality of first nodes of the first node type; generate a target second node embedding for a target second node based on features of first nodes and second nodes that are within a multi-hop second node neighbourhood of the target second node, the target second node being selected from the plurality of second nodes of the second node type; and determine a relationship between the target first node and the target second node based on the target first node embedding and the target second node embedding.
According to example embodiments of the preceding aspect, the CGN comprises: a first node first aggregating function configured to aggregate, for each of a plurality of second nodes included within the first node neighbourhood, features for first nodes that are direct neighbours of the second node within the first node neighbourhood; a first node first mapping function configured to map, for each of the plurality of second nodes, the features aggregated for the second node to a respective second node embedding; a first node second aggregating function configured to aggregate the second node embeddings for the plurality of second nodes included within the first node neighbourhood; a first node second mapping function configured to map the aggregated second node embeddings to generate the target first node embedding; a second node first aggregating function configured to aggregate, for each of a plurality of first nodes included within the second node, features for second nodes that are direct neighbours of the first node within the second node neighbourhood; a second node first mapping function configured to map, for each of the plurality of first nodes, the features aggregated for the first node to a respective first node embedding; a second node second aggregating function configured to aggregate the first node embeddings for the plurality of first nodes included within the second node neighbourhood; and a second node second mapping function configured to map the aggregated first node embeddings to generate the target second node embedding.
According to a further example aspect is a multi-graph convolution collaborative filtering system implemented in multiple layers of a multi-layer graph convolution neural network for learning about user-item preferences from a bipartite graph that includes user nodes, item nodes and interaction data about historical interactions between user nodes and item nodes, the system comprising: a bipartite-graph convolution network module configured to independently generate a user embedding for a target user node and an item embedding for a target item node based on the bipartite graph; a multi-graph encoder module configured to: construct a user-user graph representing similarities between user nodes included in the bipartite graph and generate an user-user embedding for the target user node based on the user-user graph; and construct an item-item graph representing similarities between item nodes included in the bipartite graph and generate an item-item embedding for the target item node based on the item-item graph; and a fusing operation configured to fuse the user embedding, user-user embedding, and to fuse the item embedding and item-item embedding to output information that represents a relationship between the target user node and the target item node.
Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present application, and in which:
Similar reference numerals may have been used in different figures to denote similar components.
A multi-graph convolutional collaborative filtering (Multi-GCCF) system is disclosed that may be incorporated into a graph convolution neural network (GCNN) based recommender system. As will be explained in greater detail below, in example embodiments, a Multi-GCCF system incorporates multiple graphs in an embedding learning process. The Multi-GCCF system expressively models high-order information via a bipartite user-item interaction graph, integrates proximal information from the bipartite user-item interaction graph by building and processing user-user and item-item graphs, and takes into account the intrinsic difference between user nodes and item nodes when performing graph convolution on the bipartite graph.
A graph is a data structure that comprises nodes and edges. Each node represents an instance or data point that is defined by measured data represented as a set of node features (e.g., a multidimensional feature vector). Each edge represents a relationship that connects two nodes. A bipartite graph is a form of graph structure in which each node belongs to one of two different node types and direct relationships (e.g., 1-hop neighbors) only exist between nodes of different types.
In example embodiments, user nodes uA to uF and item nodes vA to vF are each defined by a respective set of node features. For example, each user node u is defined by a respective user node feature vector xu that specifies a set of user node features. Each user node feature numerically represents a user attribute. Examples of user attributes may for example include user id, age, sex, relationship status, pet ownership, geographic location, etc. Each item node v is defined by a respective item node feature vector xv that specifies a set of item node features. Each item node feature numerically represents an item attribute. Examples of item attributes may for example include, in the case of a movie video: id, movie title, director, actors, genre, country of origin, release year, period depicted, etc.
The edges 102 that connect user nodes u to respective item nodes v indicate relationships between the nodes. In some example embodiments, the presence or absence of an edge 102 between nodes represents the existence or absence of a predefined type of relationship between the user represented by the user node and the item represented by the item node. For example, the presence or absence of an edge 102 between a user node u and an item node v indicates whether or not a user has previously undertaken an action that indicates a sentiment for or interest in a particular item, such as “clicking” on a representation of the item or submitting a scaled (e.g., 1 to 5 star) or binary (e.g. “like”) rating in respect of the item. For example, edges 102 can represent the click or rating history between users and items. In illustrative embodiments described below, edges 102 convey binary relationship information such that the presence of an edge indicates the presence of a defined type of relationship (e.g. user i has previously “clicked” or rated/liked an item j) and the absence of an edge indicates an absence of such a relationship. However, in further embodiments edges 102 may be associated with further attributes that indicate a relationship strength (for example a number of “clicks” by a user in respect of a specific item, or the level of a rating given by a user).
In example embodiments, the bipartite user-item interaction graph 100 can be represented as G=(Xu, Xv, A), where Xu is a feature matrix that defines the respective feature vectors xu of user nodes u; Xv is a feature matrix that defines the respective feature vectors xv of item nodes v, and A is an adjacency matrix that defines the connections (edges 102) between user nodes u and item nodes v. In example embodiments where edges 102 convey the presence or absence of a defined relationship, adjacency matrix A can be represented as a matrix of binary values that indicate the presence or absence of a connecting edge between each user node u and each item node v. In some examples, adjacency matrix A corresponds to a “click” or “rating” matrix. Thus, bipartite graph 100 includes historical information about users, items, and the interactions between users and items.
With reference to
In example embodiments, the embeddings hu, su, and zu generated in respect of a user node u are fused together by a user node fusing operation 214 to provide a fused user node embedding eu*, and the embeddings hv, sv, and zv generated in respect of an item node v are fused together by an item node fusing operation 215 to provide a fused item node embedding ev*.
In example embodiments, initial node embeddings eu, and ev are derived from initial node feature vectors xu, xv based on an initial set of parameters or weights of GCNN 201 and stored in respective look-up tables (LUTs) Eu LUT and EV LUT in memory associated with Multi-GCCF system 200. Additionally, updated node embeddings are stored in look-up tables (LUTs) Eu LUT and Ev LUT as they are learned during stochastic gradient descent (SGD) training of GCNN 201.
The function of each of bipar-GCN module 202, multi-graph encoder module 204 and skip connection module 206 will now each be described in greater detail.
Bipartite-Graphic Convolution Network (Bipar-GCN) Module 202
Bipar-GCN module 202 is configured to act as an encoder to generate user and item embeddings hu, hv for target user nodes uA and target item nodes vA in bipartite graph 100. Bipar-GCN module 202 is configured to combine the features of a target node with the features of other nodes in a K-hop neighborhood of the target node in order to learn more general node embeddings that take into account the contextual information available from the node's neighborhood. In this regard, Bipar-GCN module 202 is configured to learn flexible user node u and item node v embeddings eu and ev to capture the user preference and item characteristics from the user-item interaction history that is represented in bipartite graph 100. In example embodiments, Bipar-GCN module 202 learns embeddings to encode user nodes u and item nodes v as low-dimensional vectors that summarize their interactions as determined from click or rating history. As indicated above, user nodes u and item nodes v represent different types of entities and are defined by different attributes. Accordingly, in example embodiments Bipar-GCN module 202 is configured to learn user node embeddings eu and item node embeddings ev separately using respective user embedding and item embedding components 210, 211.
With reference to
Referring again to
The operation of user embedding component 210 will first be described. As indicated in
Following sampling of the neighbors of the target user node uA from layers 1 to K by sampling function 301, user embedding component 210 encodes target user node uA by iteratively aggregating the K-hop neighborhood information using graph convolution to learn function parameters (e.g., weight transformation matrices). More particularly, in example embodiments, user embedding component 210 is implemented using K layers (Layer 1u and Layer 2u in
In example embodiments, each of the Layer 1u and Layer 2u functions Agu1, σu1, Agu2 σu2 are machine learning functions and have respective learnable function parameters, namely weight transformation matrices W12, W11, W22, and W21. In example embodiments, all of the learnable function parameters of multi-GCCN system 200 (including weight transformation matrices W12, W11, W22, and W21, and further matrices described below) are initialized with predetermined initialization parameters.
In example embodiments, the Layer 1u aggregation function Agu1 is performed for each of the item nodes v∈N(uA) (e.g. Nk=1(uA)={vB,vC,vD} in illustrated example) using an element-wise weighted mean aggregator that can be represented by equation (1):
h
N(v)=MEAN({hu′0·W12,∀u′∈N(v)}) Eq. (1)
where hu0, denotes initial embedding ev, for user node u′, which is taken from the user embedding lookup table Eu LUT, and “·” represents a matrix multiplication operation.
The Layer 1, transformation function σu1 to map the learned neighborhood embeddings hN(V) to respective item node embeddings h′v for each item node v∈N(uA) can be represented by equation (2):
h′
v=σ(W11·[hv0;hN(v)]) Eq. (2)
Where: hv0 denotes the initial embeddings for the subject item node v (taken from the item node embedding lookup table Ev LUT), “;” represents concatenation, and σ(·) is the tanh activation function. In example embodiments, the Layer 2u aggregation function Agu2 that will be performed to aggregate item node embeddings h′v for the item nodes v∈N(uA) (e.g. Nk=1(uA)={vB,vC,vD} in illustrated example) is also an element-wise weighted mean aggregator that can be represented by equation (3):
h
N(u
)=MEAN({h′v·W22,∀v∈N(uA)}) Eq. (3)
The Layer 2, transformation function σu2 to map the learned neighborhood embedding hN(UA) to a target user node embedding hu can be represented by equation (4):
h
u=σ(W21·[hN(u
where: hu0 denotes initial embeddings eu for target user node uA (taken from the user node embedding lookup table Eu LUT), “;” represents concatenation, and σ(·) is the tanh activation function.
In example embodiments, the embedding hu of only the target user node uA is updated in the user node embedding look up table Eu LUT by user embedding component 210 during each iteration of a training procedure.
As indicated above, user embedding component 210 relies on initial node embeddings hu0,hv0 that are obtained from stored lookup tables. In an example embodiments, initial user node embedding hu0 is an initial embedding generated by user embedding component 210 when weights W12, W11, W22, and W21 are initialized, and may for example be generated using a standard initialization process such as a Xavier initialization. The user embedding is a vector having a pre-defined number of dimensions. The number of dimensions may for example be a hyper-parameter. In example embodiments the initial item node embedding hv0 may be set in a similar manner.
The initial user node embedding hu0, and initial item node embedding hv0 are updated in Eu LUT and Ev LUT during the stochastic gradient descent (SGD) training of the GCNN 201 layers that implement user embedding component 210 and the item embedding component 211.
As noted above, item embedding component 211 operates in a similar manner as user embedding component 210, but from the perspective of target item node vA. In this regard item embedding component 211 comprises a forward sampling function 302 for sampling a K-hop neighborhood of target item node vA. In the illustrated embodiment, K=2 and all direct (i.e., 1st hop, k=1) neighbors of target item node vA are user nodes u and all 2nd hop, k=2 neighbors are other item nodes v. Again, sampling function 302 is configured to mitigate against favoring popular nodes over unpopular nodes and by randomly sampling a fixed number of neighbors nodes for each node to define a resulting sample node neighborhood N(vA) for target item node vA. When defining sample node neighborhood N(vA) for target item node vA, sampling function 302 randomly selects up to a predefined number of user nodes u that are direct neighbors of target item node vA (Nk=1(vA)={uB,uC,uD,uE} in illustrated example), and then for each of the sampled user nodes u, randomly samples up to a predefined number of item nodes v (other than the target item node vA) that are direct neighbors of that user node u (e.g., in the illustrated example: Nk=1(uB)={vC,vD}, Nk=1(uC)={vB,vC}, Nk=1(uD)={vC,vD,} and Nk=1(uE)={vC,vD}). In example embodiments, the predefined number of user nodes u that are randomly selected and the predefined number of item nodes v that are randomly selected for inclusion in sample item node neighborhood N(vA) are predefined hyper-parameters.
In example embodiments, item embedding component 211 is also implemented using K layers (Layer 1v and Layer 2v) of the multi-layer GCNN 201. Layer 1v implements an aggregation function Agv1 that aggregates features from neighboring item nodes v for each user node u∈N(vA) to generate a neighborhood embedding hN(u) for each user node u, and a transformation function σv1 to map the neighborhood embeddings hN(u) to respective embeddings h′u for each user node. Layer 2v performs a further aggregation function Agv2 that aggregates the embeddings h′U from the user node neighbors of target item node vA to generate a neighborhood embedding hN(vA) for target item node vA and a further transformation function σv2 to map the target node neighborhood embeddings hN(vA) to a respective target node embedding hv.
In example embodiments, each of the Layer 1v and Layer 2v functions AGv1, σv1, AGv2, σv2 are machine learning functions and have respective learnable function parameters, namely weight transformation matrices Q22, Q11, Q22, and Q21. The Layer 1v and Layer 2v functions of item embedding component 211 are used to encode target item node vA by iteratively aggregating K-hop neighborhood information using graph convolution to learn the weight transformation matrices.
In example embodiments, the Layer 1v aggregation function AGv1 is performed for each of the user nodes u∈N(vA) (e.g. Nk=1(vA)={uB,uC,uD,uE} in illustrated example) using an element-wise weighted mean aggregator that can be represented by equation (5):
h
N(u)=MEAN({hv′0,·Q12,∀v′∈N(u)}) Eq. (5)
where hv0, denotes initial embeddings ev′ for item node v′, which is taken from the item embedding lookup table Ev LUT, and “·” represents matrix multiplication. For each user node u∈N(vA), the Layer 1v transformation function σv1 for mapping the learned neighborhood embeddings hN(u) to respective user node embeddings h′u for each user node u∈N(vA) can be represented by equation (6):
h′
u=σ(Q11·[hu0;hN(u)]) Eq. (6)
where: hu0 denotes the embeddings for the subject user node u (taken from the user node embedding lookup table Eu LUT), “;” represents concatenation, “·” represents matrix multiplication and σ(·) is the tanh activation function.
In example embodiments, the Layer 2v aggregation function Qv2 that will be performed to aggregate user node embeddings h′u for the user nodes u∈N(vA) (e.g. Nk=1(vA)={uB,uC,uD,uE} in illustrated example) is also an element-wise weighted mean aggregator that can be represented by equation (7):
h
N(v
)=MEAN({h′u·Q22,∀u∈N(vA)}) Eq. (7)
The Layer 2v transformation function σv2 to map the learned neighborhood embedding hN(vA) to a target item node embedding hv can be represented by equation (8):
h
v=σ(Q21·[hN(v
where: hv0 denotes initial embeddings ev for target item node vA (taken from the item node embedding lookup table Ev LUT), “;” represents concatenation, and σ(·) is the tanh activation function.
In example embodiments, the embedding hv of only the target item node vA is updated in the item node embedding look up table Ev LUT by user embedding component 211 during each iteration of a training procedure.
Multi-Graph Encoder module
Referring again to
Unlike the Bipar-GCN module 202, in example embodiments additional neighbor sampling is not performed in user-user graph construction function 220 because the constructed user-user graph Gu will typically not have a long-tailed degree distribution. In a non-limiting example embodiment, the thresholds for cosine similarity is selected to provide an average degree of 10 connections for each user-user graph Go.
User aggregation function Agu-u is learnable function configured to output a target user node embedding zu that is an aggregation of the user node embeddings over all direct user node neighbors of target user node uA in the user-user graph Gu. In example embodiments, user aggregation function Agu-u can be implemented to perform the learnable function represented by Equation (9):
z
u=σ(ΣiϵN′(u
where: σ is the tanh activation functions, N′(uA) denotes the one-hop neighborhood of target user node uA in the user-user graph Gu, Mu are learnable function parameters (e.g., a learnable user aggregation weight matrix), and eui is the node embedding, taken from user node embedding look-up-table Eu LUT, for neighbor user node ui
Item graph component 213 is similar in configuration to user graph component 212. Item graph component 213 includes an item-item graph construction function 222 that constructs item-item graph Gv and an item aggregation function Agv-v to output item-item embedding zv for target item node vA. In example embodiments, item-item user graph construction function 222 is also configured to construct item-item graph Gv by computing pairwise cosine similarities on the rows or columns of the adjacency matrix A in order to capture the proximity information among item nodes v. A threshold (which may be a predetermined hyper-value) is applied to the cosine similarity calculated in respect of each item node pair to determine if an edge is present or not between the respective item nodes in the resulting item-item graph Gv. In a non-limiting example embodiment, the thresholds for cosine similarity is selected to provide an average degree of 10 for each item-item graph Gv.
Item aggregation function Agv-v is also a learnable function and is configured to output a target item node embedding zv that is an aggregation of the item node embeddings over all direct item node neighbors of target item node vA in the item-item graph Gv. In example embodiments, user aggregation function Agv-v can be implemented to perform the learnable function represented by Equation (10):
z
v=σ(ΣjϵN′(v
where: N′(vA) denotes the one-hop neighborhood of target item node vA in the item-item graph Gv, Mv are learnable function parameters (e.g., a learnable user aggregation weight matrix), and evj is the node embedding, taken from item node embedding look-up-table Ev LUT, for neighbor item node vj.
In example embodiments, user aggregation function Agu-u and item aggregation function Agv-v are each implemented using respective layers of the multi-layer GCNN 201.
Skip Connection Module
In the embodiments described both Bipar-GCN module 202 and multi-graph encoder module 204 focus on learning node embeddings based on relationships. As a result, the impact of the initial node features on the final embedding becomes indirect. In example embodiments, skip connection module 206 is included in multi-GCCN system 200 to provide skip connections to re-emphasize the initial node features. Skip connections can be used in a convolution neural network to directly copy information that is available in the primary layers to later layers. In some applications, these skip connections may enable multi-GCCN system 200 to take into account information that may be overlooked due to the focus of Bipar-GCN module 202 and multi-graph encoder module 204 on relationships through graph processing. In this regard, skip connection module 206 can exploit residual latent information included in node feature vectors xu, xv that may not have been captured by the graph processing performed in Bipar-GCN module 202 and multi-graph encoder module 204.
Accordingly, in example embodiments, skip connection module 206 is configured to supplement the embeddings hu,su,hv,sv learned by Bipar-GCN module 202 and multi-graph encoder module 204 with information passed directly from original embeddings eu, ev of node feature vectors xu, xv.
In an example embodiments, initial embeddings eu, ev derived from node feature vectors xu, xv are respectively taken from embedding look-up tables Eu LUT and Ev LUT and processed by respective skip connection transformation functions SCu and SCv. Skip connection transformation functions SCu and SCv are each implemented using a single fully-connected layer to generate respective skip-connection embeddings su, sv. In an example embodiment, skip connection transformation functions SCu and SCv are learnable functions respectively represented by equations (11) and (12) as follows:
z
u=σ(eu·Su) Eq. (11)
z
v=σ(ev·Sv) Eq. (12)
where: σ is the tanh activation function; and Su, Sv are each learnable weight transformation matrices.
In at least some applications, the embeddings learned by Bipar-GCN module 202, multi-graph encoder module 204 and skip connection module 206 may reveal latent information from three perspectives. First, the Bipar-GCN module 202 captures behavioral similarity between user nodes and item nodes by explicitly modeling the historical interaction as a bipartite graph. Bipar-GCN generated embeddings hu, hv are derived through graph convolution over the bipartite graph 102 representing user-item interaction. Second, the multi-graph encoder module 204 identifies similarities among user nodes and item nodes by constructing user-user and item-item graphs. Multi-graph encoder generated embeddings zu, zv are derived from proximal information extracted from user-user and item-item graphs. Third, the skip connection module 206 allows learning from individual node characteristics by re-emphasizing initial features. Skip connection generated embeddings su, sv are derived directly from individual node features.
To exploit these three perspectives, multi-GCCN system 200 includes user node fusing module 214 configured to perform a user node fusion operation for fusing the embeddings hu, su, and zu to provide a fused user node embedding eu*, and item node fusing module 215 configured to perform an item node fusion operation for fusing the embeddings hv, sv, and zv to provide a fused item node embedding ev*.
In different examples embodiments, different fusing functions may be used, including for example element-wise sum, concatenation and attention functions as represented in the context of user node embeddings in the following table (1) (the same functions can also be used in respect of item node embeddings):
indicates data missing or illegible when filed
In the example described above, the target item node vA is described as a node that is not a direct neighbor of target user node uA. However, in some examples the target item node vA can be an item node that is a direct neighbor of target user node uA. In example embodiments, the items nodes v that are direct neighbors (e.g., have a pre-existing edge connection) with a user node are “positive” nodes with respect to that user node, and other item nodes that are not direct neighbors (e.g., do not have a pre-existing edge connection) with a user node are “negative” nodes with respect to that user node.
Training
In the example embodiments illustrated above, node embeddings eu*, ev* are learned at the same time as function parameters (e.g., weight matrices W12, W11, W22, W21, Q12, Q11, Q22, Q21, Mu, Mv Su, and Sv) of multi-GCCF system 200. In example embodiments, multi-GCCF system 200 is configured to be trained using forward and backward propagation for mini-batches of triplet pairs {u, i, j}. In this regard, unique user node u and item node v ({i,j}) (where i refers to an item node v that is a positive node with respect to user node u and, j refers to an item node v that is a negative node with respect to user node u) are selected from min-batch pairs and then processed to obtain low-dimensional embeddings {eu, ei, ej} after information fusion, with stochastic gradient descent on the pairwise Bayesian Personalized Ranking (BPR) loss for optimizing recommendation models. A BPR (Bayesian Personalized Ranking) loss, as indicated in in Equation 13 below, is computed for every triplet (user, positive item, negative item): for every user node, an item node having an existing connecting edge with the user node is a positive item, while the negative item is randomly sampled from all the other items. In the loss equation, eu* represents the final output embedding of a user node, ei* represents the embedding of the positive item node, and ej* represents the embedding of the negative item node. The values in the second line of Equation 13 are regularization terms. The BPR loss function is configured to push the positive items closer to the user in the latent space, and other (negative) items further from the user. The “•” represents dot product (between two embeddings).
The objective BPR loss function is as follows (Equation 13):
Where: ={(u, i, j)|(u, i)∈+, (u, j)∈−)} denotes the training batch. R′ indicates observed positive interactions. R− indicates sampled unobserved negative interactions; Θ is the model parameter set and e*u, e*j, and e*j are the learned embeddings; Regularization is conducted on function parameters and generated embeddings to prevent overfitting; and the regularization terms are parameterized by λ and β and respectively.
The result of the BPR loss function is used to determine weight adjustments that are back propagated through the layers of the multi-GCCF system 200.
Recommendation Predictions
Recommendation predictions are obtained based on the generated user and item embeddings. For example, for a user node u, a respective dot product can be obtained for the user node embedding eu* and each item embedding ev*. The dot products provide respective scores of that user node to all other item nodes. An item ranking for a user node can be determined based on the scores, and the top K ranked items selected as recommendations.
In some example embodiments, neighborhood dropout is applied to mitigate against overfitting by performing message dropout on the aggregated neighborhood features for each target node, thereby making embeddings more robust against the presence or absence of single edges.
Fusing the outputs hu, hv of the Bipar-GCN module 202 and the respective outputs zu, zv of the Multi-graph encoding module may in some applications enable the different dependency relationships encoded by the three graphs (bipartite graph G, user-user graph Gu, and item-item graph Gv) to be used to learn a relationship between target user node uA and target item node uv. In example embodiments, all three graphs can be easily constructed from historical interaction data alone without requiring side data or other external data features. Further fusing the skip connection module outputs su, sv enables original feature vector information from the user and item nodes to be used when learning the relationship between target user node uA and target item node uv.
In at least some examples, the multi-GCCN system 200 may enable an unknown relationship between two different nodes types to be estimated with greater accuracy and/or using less computational resources (e.g., one or more of memory, processor operations, or power) than other recommender solutions.
In some applications, the input data representing user nodes u and item nodes v may be characterized by high-dimensional categorical features that are sparsely populated for most nodes. Accordingly in some embodiments, a layer of GCNN 201 may be configured to perform an initial embedding operation 280 (shown in dashed lines in
Operational Overview
An overview of the operation of multi-GCCN system 200 according to example embodiments will now be described with reference to the flowchart of
In example embodiments, multi-GCCN system 200 is configured to predict node relationships for nodes in a bipartite graph G that comprises a plurality of first nodes of a first node type (e.g. user nodes u), and a plurality of nodes of a second type (e.g. item nodes v). In this regard, multi-GCCN system 200 includes a Bipar-GCN module that performs the actions set out
A second node embedding module (e.g. item embedding module 211) is configured to perform the actions: aggregating, for each first node included within a second node neighbourhood of a target second node, features for second nodes that are direct neighbours of the first node within the second node neighbourhood (block 410); mapping the features aggregated for each first node to a respective first node embedding (block 412); aggregating the first node embeddings for all of the first nodes within the second node neighbourhood (block 414); mapping the aggregated first node embeddings to a target second node embedding (block 416).
The multi-GCCN system 200 also determines a relationship between the target first node and the target second node based on the target first node embedding and the target second node embedding (block 418).
In example embodiments, each aggregating and mapping is performed using a respective function that is defined by a respective set of learnable parameters (e.g., weight matrices), wherein the aggregating and mapping is performed iteratively in respect of the target first node and the target second node and the learnable parameters updated to optimize an objective function based on the fusing of the target first node embedding and the target second node embedding.
In example embodiments, the user embedding module 210 includes a first node sampling function 310 that performs the following actions: defining the first node neighbourhood of the target first node by randomly sampling the bipartite graph to: select a second node subset from second nodes that are direct neighbours of the target first node, and select respective subsets of first nodes from first nodes that are direct neighbours of each of the second nodes of the second node subset. The user embedding module 210 includes a second node sampling function 311 that performs the following actions: defining the second node neighbourhood of the target second node by randomly sampling the bipartite graph to: select a first node subset from first nodes that are direct neighbours of the target second node, and select respective subsets of second nodes from second nodes that are direct neighbours of each of the first nodes of the first node subset.
Referring to
In some examples, in Action 504, generating the first node-first node embedding for the target first node comprises using a first node-first node aggregating function having learnable parameters to aggregate features of the first nodes that are direct neighbours of the target first node in the first node graph, and, in Action 508, generating the second node-second node embedding for the target second node comprises using a second-node-second node aggregating function having learnable parameters to aggregate features of the second nodes that are direct neighbours of the target second node in the second node graph.
In embodiments that include multi-graph encoder module 204, in Action 418, the relationship between the target first node and the target second node is determined also based on the first node-first node embedding and the second node-second node embedding.
In some example embodiments, multi-GCCN system 200 also includes a skip connection module 206 that generates, using a first skip connection transformation function having learnable parameters, a target first node skip connection embedding based on an initial target first node embedding; and generates, using a second skip connection transformation function having learnable parameters, a target second node skip connection embedding based on an initial target second node embedding.
In embodiments that include multi-graph encoder module 204 and skip connection module, in Action 418, the relationship between the target first node and the target second node is determined based on a fusing of the target first node embedding, the first node-first node embedding and the first node skip connection embedding; and a fusing of the target second node embedding, the second node-second node embedding and the second node skip connection embedding.
As noted above, in some example applications, the first nodes represent users and the second nodes represent items, the bipartite graph includes historical user-item interaction data, and the actions also include determining an item recommendation for a user represented by the target node based on the determined relationship between the target first node and the target second node.
Pairwise Neighbourhood Aggregation (PNA) Graph Convolution Laver
In the embodiments described above, Bipar-CGN module 202 aggregator functions AGu1 and AGv1 are implemented as mean aggregators. In other example embodiments, alternative aggregator functions can be used.
The neighborhood aggregation step in a graph convolution layer operates over sets of vectors to aggregate the embeddings of neighbors. In an aggregator function according to an alternative example, each neighbor node is considered as a feature of the central node in order to capture the neighborhood feature interactions by applying element-wise multiplication on every neighbor-neighbor pair. Equation (14) below illustrates an example of a pairwise neighborhood aggregation (PNA) function that can be implemented in a graph convolution layer of GCNN 201 in place of the mean aggregation functions:
Where: qi and qj are the ith and jth row of the neighborhood embedding matrixQk∈RN
Direct first-order neighborhood information (a coraser summary of the entire neighborhood) is preserved by a sum aggregator as in the above described embodiment. These two forms of neighborhood information are concatenated and the resulting vector passed through a standard multilayer perception to generate the local neighborhood embedding hN
where: [;] represents concatenation, σ(·) is the tanh activation function, and Wu,3k is the layer-k (user) aggregator weight (shared across all central user nodes at layer k).
After the aggregation process, every central node is assigned a new embedding by combining its aggregated neighborhood vector with the central node embedding vector itself. The layer-k embedding of the target user u can be represented as (equation 16):
h
u
k=σ(Wu,3k[σ(Wu,2khu0);]),hu0=eu, Eq. (16)
where eu is the initial embedding for target node 0, Wu,2k is the weight matrix for central node transformation, and Wu,3k is the weight matrix for the center-neighbor transformation function at layer k. The same operations (with different weight matrices) are applied to generate the layer k item embedding hvk of item v.
Processing System
In example embodiments, multi-GCCF system 200 is computer implemented using one or more computing devices.
The processing system 170 may include one or more processing devices 172, such as a processor, a microprocessor, a central processing unit, a hardware accelerator a graphics processing unit, a neural processing unit, a tensor, processing unit, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, or combinations thereof. The processing device 170 performs the computations described herein. The processing system 170 may also include one or more input/output (I/O) interfaces 174, which may enable interfacing with one or more appropriate input devices 184 and/or output devices 186. The processing system 170 may include one or more network interfaces 176 for wired or wireless communication with a network.
The processing system 170 may also include one or more storage units 178, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The processing system 170 may include one or more memories 180, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). The memory(ies) 180 may store instructions for execution by the processing device(s) 172, such as to carry out examples described in the present disclosure. The memory(ies) 180 may include other software instructions, such as for implementing an operating system and other applications/functions.
There may be a bus 182 providing communication among components of the processing system 170, including the processing device(s) 172, I/O interface(s) 174, network interface(s) 176, storage unit(s) 178 and/or memory(ies) 180. The bus 182 may be any suitable bus architecture including, for example, a memory bus, a peripheral bus or a video bus.
Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.
Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.
The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.
All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.
The content of all published papers identified in this disclosure are incorporated herein by reference.
This application is a continuation of and claims the benefit of International Application No. PCT/CN/2020/102481 filed Jul. 16, 2020, entitled “MULTI-GRAPH CONVOLUTION COLLABORATIVE FILTERING”, the contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN20/02481 | Jul 2020 | US |
Child | 18154523 | US |