METHOD FOR IDENTIFYING OPTIMAL INFLUENCE CUSTOMER GROUP OF MARKETING NETWORK BASED ON HYPERGRAPH THRESHOLD MODEL

TECHNICAL FIELD

The present invention belongs to the field of computer application, and particularly relates to a method for identifying a most recommendable customer group in a marketing network based on a node collective influence maximization algorithm of a hypergraph threshold model.

BACKGROUND

How to screen out a most recommendable user group for popularization in a marketing network is of great significance for reducing marketing cost, improving publicity effect, etc. A spreading process in a complex network can be used to describe a variety of phenomena in the real world, including pathophoresis, cascading failure, commodity marketing, etc. Due to heterogeneity of a network structure, some small-scale nodes exist, which play a key role in the spreading process. How to identify the key nodes, i.e., a seed set (a maximum influence set), is of great practical significance. A maximum influence set problem is considered as an NP-hard problem, the purpose of which is to select a fixed number of seed nodes to realize spread maximization, and is widely applied in the fields of product recommendation, public opinion evolution, pathophoresis, etc.

In recent years, much progress has been made in the maximum influence set problem in graphs, and algorithms such as high degree algorithm (HD), PageRank algorithm, eigenvalue based algorithm and CI algorithm have been proposed successively. However, with the continuous progress and development of complex system modeling, people gradually realize that the graphs cannot depict high-order interaction relationships. For example, in the marketing network, high-order interaction relationships exist between multiple customers and multiple products, which cannot be modeled by the graphs. Therefore, hypergraph modeling has become an important direction for complex system modeling. Currently, more and more attention has been paid to the problem of maximum influence set in hypergraphs, and heuristic algorithms such as high degree algorithm and eigenvector algorithm have been proposed successively.

Although certain progress has been made in several solutions proposed by those skilled in the art at present, in real networks such as the marketing network, due to the existence of a rich-club effect, individuals with great influence and strong purchasing power have an aggregation effect, making it difficult to measure the collective influence between multiple nodes and multiple individuals. In the graphs, an effective method at present is to construct a constrained self-satisfying equation of steady-state probability by a Message Passing method, and thereby deriving an influence weight of an individual on a steady state by a dynamical system stability theory. Therefore, the present invention popularizes a Message Passing theoretical analytical framework from the graphs to the hypergraphs, uses a hypergraph threshold model to describe an information spreading rule in real scenarios such as the marketing network, proposes a multi-node collective influence measurement method, designs an HCI-TM algorithm based on a greedy strategy to select the seed set (customer group) with the maximum influence, verifies robustness and effectiveness of the algorithm by numerical simulation, founds that the index of hypergraph collective influence can be used for predicting the occurrence of cascade phenomenon, and finally applies the method to a real marketing network, which can screen out a most recommendable customer group.

SUMMARY

The purpose of the present invention is to propose a node collective influence measurement method based on a hypergraph threshold model, design an HCI-TM algorithm to select a seed set with the maximum influence in a hypergraph, use the hypergraph threshold model to simulate spread of reputation among customers in a marketing network, apply the HCI-TM algorithm to the marketing network to identify a most recommendable customer group, and form a method for identifying a recommendable customer group in a marketing network based on a node collective influence maximization algorithm of a hypergraph threshold model.

The technical solution to achieve the above purpose:

A method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model, comprising the following steps:

- Step 1: establishing a hypergraph threshold model based on sales data first.
- Step 2: using the hypergraph threshold model to simulate spread of customer reputation in a marketing network, and setting a hyperedge threshold according to a specific scenario.
- Step 3: using an HCI-TM algorithm to screen out a most recommendable customer group for popularization, and using a hypergraph threshold rule to simulate spread of information among customers.

Further, a process of establishing the hypergraph threshold model in step 1 is as follows:

In the hypergraph threshold model, each node represents a customer, each hyperedge represents an interaction relationship between the customer and a product, and one hyperedge represents an interaction relationship between all customers who have evaluated a certain product and the product; a specific rule of the hypergraph threshold model is that when the node activation ratio in a hyperedge exceeds the hyperedge threshold, the hyperedge will be activated; when a hyperedge is activated, all nodes in the hyperedge will be activated.

A Cavity method is used to establish a conditional probability self-satisfying equation based on the hypergraph threshold rule, thus to weaken strong correlation between the nodes and the hyperedge and accurately describe an information spreading rule in a hypergraph. The self-satisfying equation described in a tree-like hypergraph is expressed as formula (1), i.e., a condition for activating node i is that: the node will be activated when any hyperedge incident with the node other than a hyperedge e_γ is activated, and a condition for activating the hyperedge e_γ is that: the hyperedge will be activated when any m_γ nodes incident with the hyperedge e_γ other than the node i are activated:

$\begin{matrix} {\begin{matrix} v_{i \to e_{γ}} = n_{i} + (1 - n_{i}) [1 - \prod_{e_{β} \in \partial {ile}_{γ}} (1 - v_{e_{β} \to i})] \\ v_{e_{γ} \to i} = 1 - \prod_{P_{h} \in P_{e_{γ} li}^{m_{γ}}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) \end{matrix} & (1) \end{matrix}$

In formula (1), v_i→e_γrepresents the probability that the node i is activated when the hyperedge e_γ is removed, and V_e_γ_→irepresents the probability that the hyperedge e_γ is activated when the node i is removed; ∂i/e_γ represents a set composed of hyperedges incident with the node i other than the hyperedge e_γ; a set of all combinatorial numbers composed of m_γ nodes in the hyperedge e_γ other than the node i is P_e_γ_/i^m^γ, P_e_γ_/i^m^γ={P₁, P₂, . . . , P_τ}, where τ=C_N_γ_-1^m^γ, P_h∈P_e_γ_/i^m^γ, and P_h={P_h1, P_h2, . . . , P_hm_γ} represents a set composed of m_γ nodes. n_irepresents whether the node i is a seed node; when the node is a seed node, n_i=1; otherwise, n_i=0.

Final states of the node i and the hyperedge e_γ can be calculated by the following formula:

$\begin{matrix} {\begin{matrix} v_{i} = n_{i} + (1 - n_{i}) [1 - \prod_{e_{β} \in \partial i} (1 - v_{e_{β} \to i})] \\ v_{e_{γ}} = 1 - \prod_{P_{h} \in P_{e_{γ}}^{m_{γ}}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) \end{matrix} & (2) \end{matrix}$

In formula (2), v_irepresents the final state of the node i, and v_e_γrepresents the final state of the hyperedge e_γ.

A method for calculating hypergraph collective influence is as follows:

To simplify formula (1), letting V_→={v₁, V₂}^T, where v₁={v_i→e_γ}_S×1, and v₂={v_e_γ_→i}_S×1. S=Σ_i=1^Nk_irepresents the sum of hyperdegrees of all nodes. Similarly, n is popularized to higher dimensions

$\begin{matrix} n_{\to} = {(n_{1}, 0)}^{T} = {(\overset{k_{i}}{\underset{S}{\underset{︸}{\dots, \overset{︷}{n_{i}, \dots, n_{i}}, \dots}}}, {\underset{︸}{0, \dots, 0}}_{S}^{})}^{T} & (3) \end{matrix}$

Therefore, formula (1) can be simplified as:

$\begin{matrix} V_{\to} = n_{\to} + G (V_{\to}) \Leftrightarrow {\begin{matrix} v_{1} = n_{1} + g_{1} (v_{2}) \\ v_{2} = 0 + g_{2} (v_{1}) \end{matrix} & (4) \end{matrix}$

Where g₁(v₂) represents a nonlinear function of v₂, and g₂(v₁) represents a nonlinear function of v₁;

As formula (4) is composed of complex nonlinear equations, which cannot be solved directly, linearization and a fixed point iteration method are used to solve formula (4):

$\begin{matrix} V_{\to}^{t + 1} = n_{\to} + J G |_{v_{\to}^{t}} \times V_{\to}^{t} & (5) \end{matrix}$

Where V_→^trepresents a state of V_→ when the number of iteration steps is t;

A specific form of a Jacobian matrix J custom-character |_V_→_tis solved as follows:

$\begin{matrix} 𝒥𝒢 = {(\begin{matrix} \frac{\partial g_{1}}{\partial v_{1}} & \frac{\partial g_{1}}{\partial v_{2}} \\ \frac{\partial g_{2}}{\partial v_{1}} & \frac{\partial g_{2}}{\partial v_{2}} \end{matrix})}_{2 S \times 2 S} & (6) \end{matrix}$

A partial derivative of g₁is calculated:

$\begin{matrix} \frac{\partial v_{i \to e_{γ}}}{\partial v_{i \to e_{β}}} = 0 & (7) \end{matrix}$

$\begin{matrix} \frac{\partial v_{i \to e_{γ}}}{\partial v_{e_{β} \to j}} |_{v_{\to}^{t}} = {\begin{matrix} 1 - n_{i} & i = j, e_{β} \neq e_{γ}, a_{e_{β} \to i, i \to e_{γ}}^{t} = 0 \\ 0 & otherwise \end{matrix} & (8) \end{matrix}$

Where a_e_β_→i,i→e_γ^t=Σ_e_μ_∈∂i/(e_γ_,e_β₎v_e_μ_→i^t.

A partial derivative of g₂is calculated as follows:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{e_{β} \to j}} = 0 & (9) \end{matrix}$

When e_β=e_γ, j≠i:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = \underset{j \notin P_{h}}{\prod_{P_{h} \notin P_{e_{γ} li}^{m_{γ}}}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) \underset{j \in P_{h}}{\sum_{P_{h} \in P_{e_{γ} li}^{m_{γ}}}} [\prod_{P \in P_{h} / j} v_{p \to e_{γ}} \underset{j \in {\tilde{P}}_{h}}{\underset{{\tilde{P}}_{h} \neq P_{h}}{\prod_{{\tilde{P}}_{h} \in P_{e_{γ} li}^{m_{γ}}}}} (1 - \prod_{p \in {\tilde{P}}_{h}} v_{p \to e_{γ}})] & (10) \end{matrix}$

Where {tilde over (P)}_h∈P_e_γ_/i^m^γand {tilde over (P)}_h≠P_h.

Letting b_j→e_γ_,e_γ_→i^t=Σ_p∈∂e_γ_/(i,j)v_p→e_γ^trepresent the number of nodes activated at time t in e_γ after nodes i and j are removed. When b_j→e_γ_,e_γ_→i^t≥m_γ, Π_p∈P_hv_p→e_γ=1 certainly exists, so

$\frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 0 .$

When b_j→e_γ_,e_γ_→i^t≤m_γ−2, any Π_p∈P_hv_p→e_γis 0, so

$\frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 0 .$

When b_j→e_γ_,e_γ_→i^t≤m_γ−1, exactly and only one combination makes Π_p∈P_h_/jv_p→e_γ=1, then

$\prod_{\begin{matrix} {\tilde{P}}_{h} \in P_{e_{γ} / i}^{m_{γ}} \\ {\tilde{P}}_{h} \neq P_{h} \\ j \in {\tilde{P}}_{h} \end{matrix}} (1 - \prod_{p \in {\tilde{P}}_{h}} v_{p \to e_{γ}}) = 1, \prod_{\begin{matrix} P_{h} \in P_{e_{γ} / i}^{m_{γ}} \\ j \notin P_{h} \end{matrix}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) = 1, so \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 1.$

Therefore:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{β}}} ❘_{V_{\to}^{i}} = {\begin{matrix} 1 & e_{β} = e_{γ}, j \neq i, b_{j \to e_{γ}, e_{γ} \to i}^{t} = m_{γ} - 1 \\ 0 & otherwise \end{matrix} & (11) \end{matrix}$

As a result, the specific form of the Jacobian matrix J custom-character _V_→_tis:

$\begin{matrix} JG ❘_{V_{\to}^{t}} = (\begin{matrix} 0 & M^{t} \\ I^{t} & 0 \end{matrix}) & (12) \end{matrix}$

Where

$ℳ^{t} = {ℳ_{e_{β} \to j, i \to e_{γ}}^{t}} = {\frac{\partial v_{i \to e_{γ}}}{\partial v_{e_{β} \to j}}} ❘_{V_{\to}^{t}}$

is a non-backtracking matrix, and

$𝒥^{t} = {𝒥_{j \to e_{β}, e_{γ} \to i}} ❘_{V_{\to}^{t}} = {\frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{β}}}} ❘_{V_{\to}^{t}}$

is a subcritical non-backtracking matrix. For the convenience of subsequent iterative derivation, the non-backtracking matrix custom-character and the subcritical non-backtracking matrix are extended to higher dimensions:

$\begin{matrix} {\begin{matrix} M_{e_{β} {jie}_{γ}}^{t} = (1 - n_{i}) H_{{ie}_{γ}} H_{{je}_{β}} δ_{ij} (1 - δ_{e_{β} e_{γ}}) M_{e_{β} {iie}_{γ}}^{t} \\ I_{{je}_{β} e_{γ} i}^{t} = H_{{ie}_{γ}} H_{{je}_{β}} δ_{e_{β} e_{γ}} (1 - δ_{ij}) I_{{je}_{γ} e_{γ} i}^{t} \end{matrix} & (13) \end{matrix}$

Where custom-character represents an element with a subscript (e_β,j, i, e_γ) in a 4-dimensional tensor {}_M×N×N×M; when i=j, δ_ij=1; otherwise, δ_ij=0. represents an element with a subscript (j, e_β, e_γ, i) in a 4-dimensional tensor {}_N×M×M×N; H={H_ie_γ}_S×Sis an incidence matrix; when the node i is incident with the hyperedge e_γ, H_ie_γ=1; otherwise, H_ie_γ=0. When a_e_β_→i,i→e_γ^t=0, M_e_β_iie_γ^t=1; otherwise, M_e_β_iie_γ^t=0. Similarly, when b_j→e_γ_,e_γ_→i^t=m_γ−1, I_je_γ_e_γ_i^t=1; otherwise, I_je_γ_e_γ_i^t=0.

The fixed point iteration method is used to iterate formula (5) as follows:

When t=1, letting V_→⁰=n_→ and V_→¹=n_→+J custom-character ×n_→:

$\begin{matrix} {[\begin{matrix} v_{1} \\ v_{2} \end{matrix}]}^{1} = [\begin{matrix} n_{1} \\ 0 \end{matrix}] + [\begin{matrix} 0 & M^{0} \\ I^{0} & 0 \end{matrix}] [\begin{matrix} n_{1} \\ 0 \end{matrix}] = [\begin{matrix} n_{1} \\ I^{0} n_{1} \end{matrix}] & (14) \end{matrix}$

A specific form of each element in formula (14) can be expressed as:

$\begin{matrix} {\begin{matrix} v_{i \to e_{γ}}^{1} = n_{i} H_{{ie}_{γ}} \\ v_{e_{γ} \to i}^{1} = H_{{ie}_{γ}} \sum_{j} n_{j} H_{{je}_{γ}} (1 - δ_{ij}) I_{{je}_{γ} e_{γ} i}^{0} \end{matrix} & (15) \end{matrix}$

A 1-norm of an activation probability v is used to measure a final activation scale in hypergraphs, and the following formula is obtained after collation:

$\begin{matrix} \begin{matrix}  v_{\to}  = \sum_{{ie}_{γ}} v_{i \to e_{γ}} + \sum_{{ie}_{γ}} v_{e_{γ} \to i} \\ = \sum_{{ie}_{γ}} n_{i} H_{{ie}_{γ}} + \sum_{{ie}_{γ}} H_{{ie}_{γ}} \sum_{j} n_{j} H_{{je}_{γ}} (1 - δ_{ij}) I_{{je}_{γ} e_{γ} i}^{0} \\ = \sum_{i} n_{i} k_{i} + \sum_{i} n_{i} \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} j}^{0} \\ = \sum_{i} n_{i} (k_{i} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} j}^{0}) \end{matrix} & (16) \end{matrix}$

By selecting the node i with the maximum k_i+Σ_e_γ_∈∂iΣ_j∈∂e_γ_/iI_ie_γ_e_γ_j⁰value as a seed, the minimum number of seed sets can be selected to realize information spread maximization. Therefore, 1-order hypergraph collective influence is defined:

$\begin{matrix} {HCI}_{1} (i) = k_{i} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} j}^{0} & (17) \end{matrix}$

Using the above method, when t=2 and V_→²=n_→+j custom-character ×V_→¹, 2-order hypergraph collective influence can be derived:

$\begin{matrix} {HCI}_{2} (i) = k_{i} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} j}^{1} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} j}^{0} \sum_{e_{μ} \in \partial j / e_{γ}} (1 - n_{j}) M_{e_{γ} {jje}_{μ}}^{1} & (18) \end{matrix}$

Similarly, n-order hypergraph collective influence can be derived as:

$\begin{matrix} {HCI}_{n} (i) = k_{i} + \sum_{L \in A_{n}} O_{L}^{n} + \sum_{L \in B_{n}} E_{L}^{n} & (19) \end{matrix}$

Where A_n={x∈N⁺|x mod 2=0,x≤n} and B_n={x∈N⁺|x mod 2=1, x≤n}.

$\begin{matrix} 𝕆_{L}^{n} = \sum_{e_{γ} \in \partial i} \sum_{i_{1} \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} i_{1}}^{n - L} \sum_{e_{γ_{1}} \in \partial i_{1} / e_{γ}} (1 - n_{i_{1}}) M_{e_{γ} i_{1} i_{1} e_{γ_{1}}}^{n - L + 1} \times \dots \times \sum_{e_{γ_{ℓ}} \in \partial i_{ℓ} / e_{γ_{ℓ - 1}}} (1 - n_{i_{ℓ}}) M_{e_{γ_{ℓ - 1}}}^{n - 1} i_{ℓ} i_{ℓ} e_{γ_{ℓ}}, \\ 𝔼_{L}^{n} = \sum_{e_{γ} \in \partial i} \sum_{i_{1} \in \partial e_{γ} / i} I_{{ie}_{γ} e_{γ} i_{1}}^{n - L} \sum_{e_{γ_{1}} \in \partial i_{1} / e_{γ}} (1 - n_{i_{1}}) M_{e_{γ} i_{1} i_{1} e_{γ_{1}}}^{n - L + 1} \times \dots \times \sum_{i_{ι} \in \partial e_{γ_{ι}} / i_{ι}} I_{i_{ι} e_{γ_{ι}} e_{γ_{ι}} i_{ι + 1}}^{n - 1} . \end{matrix}$

Where i₁represents a 1-layer neighbor node of the node i; custom-character represents an -layer neighbor node of the node i; e_γ₁represents a 1-layer neighbor hyperedge of the hyperedge e_γ; represents an -layer neighbor hyperedge of the hyperedge e_γ;

Further, a design method of the HCI-TM algorithm (a greedy algorithm for selecting a set with the maximum influence) in step 3 is as follows:

- Step 1: initializing a seed set S=Ø, and calculating the n-order hypergraph collective influence HCI_n(i) of all nodes in the hypergraph.
- Step 2: selecting the node i with the maximum n-order hypergraph collective influence as a seed, adding the seed into a seed set S={i}∪S, and then conducting spread based on the hypergraph threshold rule;
- Step 3: judging whether the node activation ratio in the hypergraph exceeds a_r; if the node activation ratio exceeds a_r, the maximum influence set in this condition is S;
- Step 4: if the node activation ratio does not exceed a_r, recalculating the n-order hypergraph collective influence HCI_n(i) of a ┌n/2┐-layer neighbor node of an active node, and repeating step 2 to step 3.

Robustness and effectiveness of the HCI-TM algorithm are verified on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5. Specific steps are as follows:

- Step 1: using a configuration model to generate the Erdös-Rényi hypergraphs, the scale-free hypergraphs and the uniform hypergraphs with different parameter settings.
- Step 2: using the HCI-TM algorithm and other contrast algorithms to select seed sets on different types of hypergraphs, conducting spread based on the threshold rule, and recording the activation scale in hypergraphs at a fixed interval (ratio of seed nodes).
- Step 3: making a comparison with other algorithms to evaluate the effectiveness of the HCI-TM algorithm: when a same ratio of seed nodes is selected, the algorithm with a larger activation scale in hypergraphs is more effective.

By analyzing the evolution of the maximum HCI value in the hypergraphs in a seed selection process, it is verified that the index of hypergraph collective influence can be used as a criterion for predicting the occurrence of cascade phenomenon. Specific steps thereof are as follows:

- Step 1: using the HCI-TM algorithm to select the seed nodes on the hypergraphs, and recording the HCI value of the selected seed nodes (the maximum HCI value in the hypergraphs at the moment).
- Step 2: analyzing the figure of the evolution of the HCI value with the selection of the seed nodes; the index of HCI can be used as a criterion for predicting the occurrence of cascade phenomenon: the peak value of HCI is corresponding to a phase transformation point where the cascade phenomenon occurs; by observing the evolution of the maximum HCI value in the hypergraphs, whether and when the cascade phenomenon will occur in a hypergraph spreading process can be judged.

Compared with the prior art, the present invention has the following beneficial effects:

- (1) The present invention popularizes a Message Passing theoretical analytical framework from the graphs to the hypergraphs, uses the hypergraph threshold model to simulate the spread of information in real scenarios such as the marketing network, and proposes a multi-node collective influence measurement method by analyzing the self-satisfying equation that satisfies the threshold rule.
- (2) The present invention designs a greedy algorithm for selecting a set with the maximum influence (the HCI-TM algorithm) based on a hypergraph collective influence measurement method; the algorithm has strong robustness and effectiveness, and is applied in the marketing network.
- (3) The present invention provides the criterion for predicting the occurrence of hypergraph cascade phenomenon based on the concept of hypergraph collective influence.

DESCRIPTION OF DRAWINGS

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a schematic diagram of a hypergraph threshold rule of the present invention; ovals represent hyperedges, and circles represent nodes.

FIG. 3 is a schematic diagram of hypergraph collective influence of the present invention; a is a schematic diagram of 1-order hypergraph collective influence, and b is a schematic diagram of 2-order hypergraph collective influence. Circles represent nodes, triangles represent hyperedges, connected edges indicate that incidence relationships exists between the hyperedges and the nodes, triangles with rhombuses indicate that the hyperedges are in a subcritical state, i.e., in a hyperedge e_γ, m_γ−1 nodes have already been activated.

FIG. 4 shows the performance of a HCI-TM algorithm of the present invention and other algorithms on generated hypergraphs under different parameter settings; each horizontal axis q is the ratio of seed nodes, and each vertical axis Q(q) is the activation ratio of nodes in the hypergraphs when the seed nodes with a ratio q is selected. a-d represent the performance of the algorithms on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3 and the hypergraph scales of 10,000 and 100,000; e-h represent the performance of the algorithms on scale-free hypergraphs with the power-law exponents of 1.5 and 2 and the hypergraph scales of 10,000 and 100,000; i−1 represent the performance of the algorithms on uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5 and the hypergraph scales of 10,000 and 100,000.

FIG. 5 shows actual time complexity analysis of a HCI-TM algorithm of the present invention and other algorithms on hypergraphs. Each horizontal axis is log₁₀N, where N is the hypergraph scale. Each vertical axis is log₁₀T, where T is the operating time. a and d represent the actual time complexity analysis on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, respectively; b and e represent the actual time complexity analysis on scale-free hypergraphs with the power-law exponents of 1.5 and 2; and c and f represent the actual time complexity analysis on uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5.

FIG. 6 is the figure of the evolution of the maximum HCI value in hypergraphs of the present invention with the selection of seed nodes. a represents the figure of the evolution of the maximum HCI value in hypergraphs on an Erdös-Rényi hypergraph with the average hyperdegree of 3 and the hypergraph scale of 10,000; b represents the figure of the evolution of the maximum HCI value in hypergraphs on a scale-free hypergraph with the power-law exponent of 1.5 and the hypergraph scale of 10,000; c represents the figure of the evolution of the maximum HCI value in hypergraphs on a uniform hypergraph with the cardinality of hyperdegree of 5 and the hypergraph scale of 10,000.

FIG. 7 shows the performance of a HCI-TM algorithm of the present invention and other contrast algorithms on a marketing network (a Las Vegas bar review dataset).

DETAILED DESCRIPTION

In order to describe the present invention more specifically, the technical solution of the present invention will be described below in detail in combination with the drawings and specific embodiments.

Firstly, in a study process of the node collective influence measurement method of the hypergraph threshold model, the solution uses the hypergraph threshold model to simulate the information spreading rule in real scenarios such as the marketing network, each node represents a customer or individual in the real scenarios, each hyperedge represents a high-order interaction relationship between individuals, and a specific rule of the hypergraph threshold model is that when the node activation ratio in a hyperedge exceeds the hyperedge threshold, the hyperedge will be activated; when a hyperedge is activated, all nodes in the hyperedge will be activated.

FIG. 1 is taken as an example to describe a hypergraph threshold rule. All hyperedge thresholds in the hypergraph are set to 0.5; when the initial t=0, node 3 is selected as a seed node; at this time, an active node and hyperedge set U(0)={3}. When t=1, the hyperedge e_γ2and the hyperedge e_γ3reach the thresholds and are activated, and U(1)={3, e_γ2, e_γ3}. When t=2, the activation of the hyperedge e_γ2and the hyperedge e_γ3causes node 2 and node 6 to be activated, and U(2)={3, e_γ2, e_γ3, 2,6}. When t=3, the hyperedge e_γ1and the hyperedge e_γ5are activated, and U(3)={3, e_γ2, e_γ3, 2,6, e_γ1, e_γ5}. At the next moment, node 1 and node 7 are activated, and U(4)={3, e_γ2, e_γ3, 2,6, e_γ1, e_γ5, 1,7}. Finally, no new hyperedge or node can be activated, and the spreading process is stopped.

Secondly, a Cavity method is used to establish a conditional probability self-satisfying equation based on the hypergraph threshold rule, thus to weaken strong correlation between the nodes and the hyperedges and more accurately describe an information spreading rule in a hypergraph. With respect to the self-satisfying equation, a condition for activating node i is that the node will be activated when any hyperedge incident with the node other than a hyperedge e_γ is activated, and a condition for activating the hyperedge e_γ is that the hyperedge will be activated when any m_γ nodes incident with the hyperedge e_γ are activated:

$\begin{matrix} ? & (1) \end{matrix}$

$? indicates text missing or illegible when filed$

Final states of the node i and the hyperedge e_γ can be calculated by the following formula:

In formula (2), v_irepresents the final state of the node i, and v_e_γrepresents the final state of the hyperedge e_γ.

To simplify formula (1), letting V_→={v₁, v₂}^T, where v_i{V_i→e_γ}_S×1, and v₂={v_e_γ_→i}_S×1. S=Σ_i=1^Nk_irepresents the sum of hyperdegrees of all nodes. Similarly, n is popularized to higher dimensions

$\begin{matrix} n_{\to} = {(n_{1}, 0)}^{T} = {(\underset{S}{\underset{︸}{\dots, \overset{k_{i}}{\overset{︷}{n_{i}, \dots, n_{i}}}, \dots}}, \underset{S}{\underset{︸}{0, \dots, 0}})}^{T} & (3) \end{matrix}$

Therefore, formula (1) can be simplified as:

$\begin{matrix} V_{\to} = n_{\to} + G (V_{\to}) \Leftrightarrow {\begin{matrix} v_{1} = n_{1} + g_{1} (v_{2}) \\ v_{2} = 0 + g_{2} (v_{1}) \end{matrix} & (4) \end{matrix}$

As formula (3) is composed of complex nonlinear equations, which cannot be solved directly, linearization and a fixed point iteration method are used to solve formula (3):

$\begin{matrix} V_{\to}^{t + 1} = n_{\to} + JG |_{V_{\to}^{t}} \times V_{\to}^{t} & (5) \end{matrix}$

A specific form of a Jacobian matrix J custom-character |_V_→_tis solved as follows:

$\begin{matrix} J = {(\begin{matrix} \frac{\partial g_{1}}{\partial v_{1}} & \frac{\partial g_{1}}{\partial v_{2}} \\ \frac{\partial g_{2}}{\partial v_{1}} & \frac{\partial g_{2}}{\partial v_{2}} \end{matrix})}_{2 S \times 2 S} & (6) \end{matrix}$

A partial derivative of g₁is calculated:

$\begin{matrix} \frac{\partial v_{i \to e_{γ}}}{\partial v_{j \to e_{β}}} = 0 & (7) \end{matrix}$

$\begin{matrix} \frac{\partial v_{i \to e_{γ}}}{\partial v_{e_{β} \to j}} |_{V_{\to}^{t}} = {\begin{matrix} 1 - n_{i} & i = j, e_{β} \neq e_{γ}, a_{e_{β} \to i, i \to e_{γ}}^{t} = 0 \\ 0 & otherwise \end{matrix} & (8) \end{matrix}$

Where a_e_β_→i,i→e_γ^t=Σ_e_μ_∈∂i/(e_γ_,e_β₎v_e_μ_→i^t.

A partial derivative of g₂is calculated as follows:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{e_{β} \to j}} = 0 & (9) \end{matrix}$

When e_β=e_γ, j≠i:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = \underset{j \notin P_{h}}{\prod_{P_{h} \in P_{e_{γ / i}}^{m_{γ}}}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) \underset{j \in P_{h}}{\sum_{P_{h} \in P_{e_{γ / i}}^{m_{γ}}}} [\prod_{p \in P_{h} / j} v_{p \to e_{γ}} \underset{j \in {\tilde{P}}_{h}}{\underset{{\tilde{P}}_{h} \neq P_{h}}{\prod_{{\tilde{P}}_{h} \in P_{e_{γ / i}}^{m_{γ}}}}} (1 - \prod_{p \in {\tilde{P}}_{h}} v_{p \to e_{γ}})] & (10) \end{matrix}$

Letting b_j→e_γ_{, e}_γ_→i^t=p_∈∂e_γ_/(i,j)v_p→e_γ^trepresent the number of nodes activated at time t in e_γ after nodes i and j are removed. When b_j→e_γ_,e_γ_→i^t≤m_γ, Π_p∈P_hv_p→e_γ=1 certainly exists, so

$\frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 0.$

When b_e_β_→i,i→e_γ^t≤m_γ−2, any Π_p∈P_hv_p→e_γ=1 is 0, so

$\frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 0.$

When b_e_β_→i,i→e_γ^t≤m_γ−1, exactly and only one combination makes Π_p∈P_hv_p→e_γ=1, then

$\prod_{\underset{j \in {\overline{P}}_{h}}{\underset{{\overline{P}}_{h} \neq P_{h}}{{\overline{P}}_{h} \in P_{e_{γ / i}}^{m_{γ}}}}} (1 - \prod_{p \in {\overline{P}}_{h}} v_{p \to e_{γ}}) = 1,$

$\prod_{\underset{j \notin P_{h}}{P_{h} \in P_{e_{γ / i}}^{m_{γ}}}} (1 - \prod_{p \in P_{h}} v_{p \to e_{γ}}) = 1, so \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{γ}}} = 1.$

Therefore:

$\begin{matrix} \frac{\partial v_{e_{γ} \to i}}{\partial v_{j \to e_{β}}} |_{V_{\to}^{t}} = {\begin{matrix} 1 & e_{β} = e_{γ}, j \neq i, b_{j \to e_{γ}, e_{γ} \to i}^{t} = m_{γ} - 1 \\ 0 & otherwise \end{matrix} & (11) \end{matrix}$

As a result, the specific form of the Jacobian matrix J custom-character |_V_→_tis:

$\begin{matrix} JG |_{V_{\to}^{t}} = (\begin{matrix} 0 & M^{t} \\ I^{t} & 0 \end{matrix}) & (12) \end{matrix}$

Where

$t = {{e_{β} \to j, i \to e_{γ}}_{t}} = {\frac{\partial v_{i \to e_{γ}}}{\partial v_{e_{β} \to j}}} |_{V_{\to}^{t}}$

is a non-backtracking matrix, and

$t = {j \to e_{β}, e_{γ} \to i} |_{V_{\to}^{t}} = {\frac{\partial v_{e_{γ \to i}}}{\partial v_{j \to e_{β}}}} |_{V_{\to}^{t}}$

$\begin{matrix} {\begin{matrix} M_{e_{β} {jie}_{γ}}^{t} = (1 - n_{i}) H_{{ie}_{γ}} H_{j e_{β}} δ_{ij} (1 - δ_{e_{β} e_{γ}}) M_{e_{β} {iie}_{γ}}^{t} \\ I_{{je}_{β} e_{γ} i}^{t} = H_{{ie}_{γ}} H_{j e_{β}} δ_{e_{β} e_{γ}} (1 - δ_{ij}) I_{{je}_{γ} e_{γ} i}^{t} \end{matrix} & (13) \end{matrix}$

Where H={H_ie_γ}_S×Sis an adjacency matrix; when a_e_β_→i,i→e_γ^t=0, M_e_β_iie_γ^t, =1; otherwise, M_e_β_iie_γ^t=0. Similarly, when b_j→e_γ_,e_γ_→i^t=m_γ−1, I_j→e_γ_e_γ_→i^t=1; otherwise, I_j→e_γ_e_γ_→i^t=0.

The fixed point iteration method is used to iterate formula (5) as follows:

When t=1, letting V_→⁰=n_→ and V_→¹=n_→+Jg⁰×n_→:

A specific form of each element in formula (14) can be expressed as:

A 1-norm of an activation probability v is used to measure a final activation scale in hypergraphs, and the following formula is obtained after collation:

$\begin{matrix} \begin{matrix}  v_{\to}  = \sum_{i e_{γ}} v_{i \to e_{γ}} + \sum_{i e_{γ}} v_{e_{γ} \to i} \\ = \sum_{i e_{γ}} n_{i} H_{i e_{γ}} + \sum_{i e_{γ}} H_{i e_{γ}} \sum_{j} n_{j} H_{j e_{γ}} (1 - δ_{i j}) I_{j e_{γ} e_{γ} i}^{0} \\ = \sum_{i} n_{i} k_{j} + \sum_{i} n_{i} \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{i e_{γ} e_{γ} j}^{0} \\ = \sum_{i} n_{i} (k_{j} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{i e_{γ} e_{γ} j}^{0}) \end{matrix} & (16) \end{matrix}$

$\begin{matrix} {HCI}_{1} (i) = k_{i} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{i e_{γ} e_{γ} j}^{0} & (17) \end{matrix}$

As shown in a of FIG. 2, the hyperdegree of node i is k_i=5, and Σ_e_γ_∈∂iΣ_j∈∂e_γ_/iI_ie_γ_e_γ_j⁰=3 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 2 is 3, so the 1-order hypergraph collective influence of the node i is HCI₁(i)=8.

Using the above method, when t=2 and V_→²=n_→+Jg¹×V_→¹, 2-order hypergraph collective influence can be derived at this time:

$\begin{matrix} H C I_{2} (i) = k_{i} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{i e_{γ} e_{γ} j}^{1} + \sum_{e_{γ} \in \partial i} \sum_{j \in \partial e_{γ} / i} I_{i e_{γ} e_{γ} j}^{0} \sum_{e_{μ} \in \partial j / e_{γ}} (1 - n_{j}) M_{e_{γ} j j e_{μ}}^{1} & (18) \end{matrix}$

As shown in b of FIG. 2, the hyperdegree of node i is k_i=5, Σ_e_γ_∈∂iΣ_j∈∂e_γ_/iI_ie_γ_e_γ_j⁰=3 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 2 is 3, and Σ_e_γ_∈∂iΣ_j∈∂e_γ_/iI_ie_γ_e_γ_j⁰Σ_e_μ_∈∂j/e_γ(1−n_j)M_e_γ_jje_μ¹=4 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 3 is 4, so the 2-order hypergraph collective influence of the node i is HCI₁(i)=12.

Similarly, n-order hypergraph collective influence can be derived as:

$\begin{matrix} H C I_{n} (i) = k_{i} + \sum_{L \in A_{n}} O_{L}^{n} + \sum_{L \in B_{n}} E_{L}^{n} & (19) \end{matrix}$

Where A_n={x∈N⁺|x mod 2=0,x≤n} and B_n={x∈N⁺|x mod 2=1,x≤n};

In the part of HCI-TM algorithm design and numerical simulation evaluation, firstly, a greedy algorithm for selecting a set with the maximum influence (the HCI-TM algorithm) is designed based on a hypergraph collective influence measurement method. Specific steps are as follows:

- Step 1: initializing a seed set S=Ø, and calculating the n-order hypergraph collective influence HCI_n(i) of all nodes in the hypergraph.
- Step 2: selecting the node i with the maximum hypergraph collective influence as a seed, adding the seed into a seed set S={i}∪S, and then conducting spread based on the hypergraph threshold rule.
- Step 3: judging whether the node activation ratio in the hypergraph exceeds a_r; if the node activation ratio exceeds a_r, the maximum influence set in this condition is S.
- Step 4: if the node activation ratio does not exceed a_r, recalculating the n-order hypergraph collective influence HCI_n(i) of a ┌n/2┐-layer neighbor node of an active node, updating the state of the hyperedge (active state, subcritical state and non-subcritical state, wherein the active state indicates that the hyperedge is activated, the subcritical state indicates that the hyperedge will be activated if one more node in the hyperedge is activated, and other states become the non-subcritical state), and repeating step 2 to step 3.

Secondly, robustness and effectiveness of the HCI-TM algorithm are verified on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5; in order to ensure the reliability, the average of 10 independent experiments is taken as an experimental result. Specific steps are as follows:

- Step 1: using a configuration method to generate 10 Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5, and the node scales of 5,000, 10,000, 20,000, 30,000, 50,000 and 100,000, respectively.
- Step 2: using the HCI-TM algorithm and other contrast algorithms to select seed sets on different scales and types of hypergraphs, conducting spread based on the threshold rule, and recording the activation scale in hypergraphs at a fixed interval (ratio of seed nodes).
- Step 3: making a comparison with other algorithms to evaluate the effectiveness of the HCI-TM algorithm: when a same ratio of seed nodes is selected, the algorithm with a larger activation scale in hypergraphs is more effective.

FIG. 3 shows the performance of the HCI-TM algorithm and other algorithms (high degree algorithm (HD), high degree adaptive algorithm (HDA), neighbor preference algorithm (NP), neighbor preference adaptive algorithm (NPA), PageRank algorithm (PR) and Random algorithm (RA)) on different scales and types of Erdös-Rényi hypergraphs, scale-free hypergraphs and uniform hypergraphs; the performance of the HCI-TM algorithm is better than that of other algorithms, and fewer seed nodes can be selected to achieve a larger activation range. At the same time, a 2-order HCI-TM algorithm is superior to a 1-order algorithm, which proves that a higher-order HCI-TM algorithm has better performance. The robustness and practicality of the HCI-TM algorithm are verified.

Recording operating time of different algorithms on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5, and the node scales of 5,000, 10,000, 20,000, 30,000, 50,000 and 100,000, respectively. Taking log₁₀N as the horizontal axis (N is hypergraph scale). Each vertical axis is log₁₀T, where T is the operating time. The broken line graph shown in FIG. 4 is drawn and is processed by one order polynomial fitting. It is found that the time complexity of the HCI-TM algorithm is near linear in actual operation.

Finally, an evolution relationship of the maximum HCI value in a seed selection process is analyzed on the Erdös-Rényi hypergraph with the average hyperdegree of 3, the scale-free hypergraph with the power-law exponent of 1.5, and the uniform hypergraph with the cardinality of hyperdegree of 5, and the node scale of 10,000. It is verified that the index of hypergraph collective influence can be used as a criterion for predicting the occurrence of cascade phenomenon. Specific steps thereof are as follows:

- Step 1: using the HCI-TM algorithm to select the seed nodes on the hypergraphs, and recording the HCI value of the selected seed nodes (the maximum HCI value in the hypergraphs at the moment).
- Step 2: analyzing the figure of the evolution of the HCI value with the selection of the seed nodes, the index of HCI can be used as a criterion for the occurrence of cascade phenomenon: the peak value of HCI is corresponding to a phase transformation point where the cascade phenomenon occurs; by observing the evolution of the maximum HCI value in the hypergraphs, whether and when explosive information spread (the cascade phenomenon) will occur in a hypergraph spreading process can be judged.

In the figure of the evolution of the maximum HCI value in hypergraphs with the selection of seeds, as shown in FIG. 5, each horizontal axis represents the number of seed nodes, and each vertical axis represents the maximum HCI value in the hypergraphs at the moment. Compared with FIG. 3, it is found that the peak value of hypergraph collective influence (HCI) is corresponding to a phase transformation point where the cascade phenomenon occurs, and an emergent behavior is induced by the cascade phenomenon among the nodes in the hypergraphs; therefore, the index of hypergraph collective influence can be used as an effective criterion for the occurrence of emergent phenomenon. At the same time, the peak advance of the 2-order hypergraph collective influence and the 1-order hypergraph collective influence also confirm that the higher-order HCI-TM algorithm has a better effect.

In the part of algorithm application study, the marketing network is taken as an example, the hypergraph threshold model is used to simulate spread of reputation among customers, and the HCI-TM algorithm is applied to the real scenario of the marketing network (the Las Vegas bar review dataset) to identify a most recommendable customer set in the marketing network.

- Step 1: modeling a hypergraph, wherein the hypergraph is constructed by intercepting Las Vegas bar review for one month from Yelp Kaggle competition data. The hypergraph has 1,234 nodes and 1,194 hyperedges, wherein the nodes represent customers, and the hyperedges represent different cocktail categories. If a customer reviews a cocktail, an incidence relationship exists between the customer (a node) and the cocktail (a hyperedge).
- Step 2: using the hypergraph threshold model to simulate the spread of customer reputation in a marketing network, and setting a hyperedge threshold to 0.6 according to the principle of majority rule, i.e., among reviews of a cocktail, if 60% of customer reviews are positive, the cocktail is considered recommended, and the overall customer rating of the bar will also become positive.
- Step 3: using the HCI-TM algorithm to screen out the most recommendable customer group for popularization, drawing a diagram of relationships between the ratio of selected recommendable groups and the ratio of customers with positive reviews, and making a comparison with other algorithms.
- Step 4: evaluating the effect of the algorithms, and selecting a same number of recommendable customers for popularization; the higher the ratio of customers with positive reviews is, the better the effect of an algorithm is, the more the marketing cost can be saved.

The performance of the HCI-TM algorithm and other contrast algorithms on the Las Vegas bar review dataset is shown in FIG. 6. The HCI1-TM algorithm and a HCI2-TM algorithm only need to select 82 and 49 customers for popularization, and the ratio of customers with positive reviews can reach 90%. To achieve the above goals, a HHDA algorithm and the NPA algorithm need to select 124 and 136 customers for popularization, respectively. The HCI-TM algorithm can be used for accurately identifying the most recommendable customer group, which is of great significance for reducing the marketing cost.

The above-mentioned technical solution only reflects a preferred technical solution of technical solutions of the present invention. Some changes made to certain parts by those skilled in the art all reflect the principle of the present invention, and belong to the protective scope of the present invention.

METHOD FOR IDENTIFYING OPTIMAL INFLUENCE CUSTOMER GROUP OF MARKETING NETWORK BASED ON HYPERGRAPH THRESHOLD MODEL

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)