METHOD FOR IDENTIFYING OPTIMAL INFLUENCE CUSTOMER GROUP OF MARKETING NETWORK BASED ON HYPERGRAPH THRESHOLD MODEL

Information

  • Patent Application
  • 20250173751
  • Publication Number
    20250173751
  • Date Filed
    April 08, 2024
    a year ago
  • Date Published
    May 29, 2025
    a month ago
Abstract
The present invention discloses a method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model. The method specifically comprises: step 1: establishing a hypergraph threshold model based on sales data first. Step 2: using the hypergraph threshold model to simulate spread of customer reputation in a marketing network, and setting a hyperedge threshold according to a specific scenario. Step 3: using an HCI-TM algorithm to screen out a most recommendable customer group for popularization, and using a hypergraph threshold rule to simulate spread of information among customers. The present invention proposes a multi-node collective influence measurement method for the hypergraph threshold model, designs the HCI-TM algorithm to efficiently select a maximum influence set, and applies the algorithm in the marketing network at the same time.
Description
TECHNICAL FIELD

The present invention belongs to the field of computer application, and particularly relates to a method for identifying a most recommendable customer group in a marketing network based on a node collective influence maximization algorithm of a hypergraph threshold model.


BACKGROUND

How to screen out a most recommendable user group for popularization in a marketing network is of great significance for reducing marketing cost, improving publicity effect, etc. A spreading process in a complex network can be used to describe a variety of phenomena in the real world, including pathophoresis, cascading failure, commodity marketing, etc. Due to heterogeneity of a network structure, some small-scale nodes exist, which play a key role in the spreading process. How to identify the key nodes, i.e., a seed set (a maximum influence set), is of great practical significance. A maximum influence set problem is considered as an NP-hard problem, the purpose of which is to select a fixed number of seed nodes to realize spread maximization, and is widely applied in the fields of product recommendation, public opinion evolution, pathophoresis, etc.


In recent years, much progress has been made in the maximum influence set problem in graphs, and algorithms such as high degree algorithm (HD), PageRank algorithm, eigenvalue based algorithm and CI algorithm have been proposed successively. However, with the continuous progress and development of complex system modeling, people gradually realize that the graphs cannot depict high-order interaction relationships. For example, in the marketing network, high-order interaction relationships exist between multiple customers and multiple products, which cannot be modeled by the graphs. Therefore, hypergraph modeling has become an important direction for complex system modeling. Currently, more and more attention has been paid to the problem of maximum influence set in hypergraphs, and heuristic algorithms such as high degree algorithm and eigenvector algorithm have been proposed successively.


Although certain progress has been made in several solutions proposed by those skilled in the art at present, in real networks such as the marketing network, due to the existence of a rich-club effect, individuals with great influence and strong purchasing power have an aggregation effect, making it difficult to measure the collective influence between multiple nodes and multiple individuals. In the graphs, an effective method at present is to construct a constrained self-satisfying equation of steady-state probability by a Message Passing method, and thereby deriving an influence weight of an individual on a steady state by a dynamical system stability theory. Therefore, the present invention popularizes a Message Passing theoretical analytical framework from the graphs to the hypergraphs, uses a hypergraph threshold model to describe an information spreading rule in real scenarios such as the marketing network, proposes a multi-node collective influence measurement method, designs an HCI-TM algorithm based on a greedy strategy to select the seed set (customer group) with the maximum influence, verifies robustness and effectiveness of the algorithm by numerical simulation, founds that the index of hypergraph collective influence can be used for predicting the occurrence of cascade phenomenon, and finally applies the method to a real marketing network, which can screen out a most recommendable customer group.


SUMMARY

The purpose of the present invention is to propose a node collective influence measurement method based on a hypergraph threshold model, design an HCI-TM algorithm to select a seed set with the maximum influence in a hypergraph, use the hypergraph threshold model to simulate spread of reputation among customers in a marketing network, apply the HCI-TM algorithm to the marketing network to identify a most recommendable customer group, and form a method for identifying a recommendable customer group in a marketing network based on a node collective influence maximization algorithm of a hypergraph threshold model.


The technical solution to achieve the above purpose:


A method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model, comprising the following steps:

    • Step 1: establishing a hypergraph threshold model based on sales data first.
    • Step 2: using the hypergraph threshold model to simulate spread of customer reputation in a marketing network, and setting a hyperedge threshold according to a specific scenario.
    • Step 3: using an HCI-TM algorithm to screen out a most recommendable customer group for popularization, and using a hypergraph threshold rule to simulate spread of information among customers.


Further, a process of establishing the hypergraph threshold model in step 1 is as follows:


In the hypergraph threshold model, each node represents a customer, each hyperedge represents an interaction relationship between the customer and a product, and one hyperedge represents an interaction relationship between all customers who have evaluated a certain product and the product; a specific rule of the hypergraph threshold model is that when the node activation ratio in a hyperedge exceeds the hyperedge threshold, the hyperedge will be activated; when a hyperedge is activated, all nodes in the hyperedge will be activated.


A Cavity method is used to establish a conditional probability self-satisfying equation based on the hypergraph threshold rule, thus to weaken strong correlation between the nodes and the hyperedge and accurately describe an information spreading rule in a hypergraph. The self-satisfying equation described in a tree-like hypergraph is expressed as formula (1), i.e., a condition for activating node i is that: the node will be activated when any hyperedge incident with the node other than a hyperedge eγ is activated, and a condition for activating the hyperedge eγ is that: the hyperedge will be activated when any mγ nodes incident with the hyperedge eγ other than the node i are activated:









{





v

i


e
γ



=


n
i

+


(

1
-

n
i


)

[

1
-





e
β






ile
γ





(

1
-

v


e
β


i



)



]









v


e
γ


i


=

1
-





P
h



P


e
γ


li


m
γ





(

1
-




p


P
h




v

p


e
γ





)











(
1
)







In formula (1), vi→eγ represents the probability that the node i is activated when the hyperedge eγ is removed, and Veγ→i represents the probability that the hyperedge eγ is activated when the node i is removed; ∂i/eγ represents a set composed of hyperedges incident with the node i other than the hyperedge eγ; a set of all combinatorial numbers composed of mγ nodes in the hyperedge eγ other than the node i is Peγ/imγ, Peγ/imγ={P1, P2, . . . , Pτ}, where τ=CNγ-1mγ, Ph∈Peγ/imγ, and Ph={Ph1, Ph2, . . . , Phmγ} represents a set composed of mγ nodes. ni represents whether the node i is a seed node; when the node is a seed node, ni=1; otherwise, ni=0.


Final states of the node i and the hyperedge eγ can be calculated by the following formula:









{





v
i

=


n
i

+


(

1
-

n
i


)

[

1
-





e
β




i




(

1
-

v


e
β


i



)



]









v

e
γ


=

1
-





P
h



P

e
γ


m
γ





(

1
-




p


P
h




v

p


e
γ





)











(
2
)







In formula (2), vi represents the final state of the node i, and veγ represents the final state of the hyperedge eγ.


A method for calculating hypergraph collective influence is as follows:


To simplify formula (1), letting V={v1, V2}T, where v1={vi→eγ}S×1, and v2={veγ→i}S×1. S=Σi=1Nki represents the sum of hyperdegrees of all nodes. Similarly, n is popularized to higher dimensions










n


=



(


n
1

,
0

)

T

=


(







,



n
i

,


,

n
i




,




S


k
i



,



0
,


,
0



S



)

T






(
3
)







Therefore, formula (1) can be simplified as:










V


=



n


+

G

(

V


)




{





v
1

=


n
1

+


g
1

(

v
2

)









v
2

=

0
+


g
2

(

v
1

)












(
4
)







Where g1(v2) represents a nonlinear function of v2, and g2(v1) represents a nonlinear function of v1;


As formula (4) is composed of complex nonlinear equations, which cannot be solved directly, linearization and a fixed point iteration method are used to solve formula (4):










V


t
+
1


=



n


+

J

G



|

v

t



×

V

t







(
5
)







Where Vt represents a state of V when the number of iteration steps is t;


A specific form of a Jacobian matrix Jcustom-character|Vt is solved as follows:









𝒥𝒢
=


(







g
1





v
1









g
1





v
2











g
2





v
1









g
2





v
2






)


2

S
×
2

S






(
6
)







A partial derivative of g1 is calculated:













v

i


e
γ







v

i


e
β





=
0




(
7
)

















v

i


e
γ







v


e
β


j





|

v

t



=

{




1
-

n
i






i
=
j

,


e
β



e
γ


,


a



e
β


i

,

i


e
γ



t

=
0






0


otherwise








(
8
)







Where aeβ→i,i→eγteμ∈∂i/(eγ,eβ)veμ→it.


A partial derivative of g2 is calculated as follows:













v


e
γ


i






v


e
β


j




=
0




(
9
)







When eβ=eγ, j≠i:














v


e
γ


i






v

j


e
γ





=






P
h



P


e
γ


li




m
γ






j


P
h




(

1
-




p


P
h




v

p


e
γ





)











P
h



P


e
γ


li




m
γ






j


P
h




[




P



P
h

/
j





v

p


e
γ












P
~

h



P


e
γ


li




m
γ








P
~

h



P
h




j



P
~

h




(

1
-




p



P
~

h




v

p


e
γ





)




]






(
10
)







Where {tilde over (P)}h∈Peγ/imγ and {tilde over (P)}h≠Ph.


Letting bj→eγ,eγ→itp∈∂eγ/(i,j)vp→eγt represent the number of nodes activated at time t in eγ after nodes i and j are removed. When bj→eγ,eγ→it≥mγ, Πp∈Phvp→eγ=1 certainly exists, so










v


e
γ


i






v

j


e
γ





=

0
.





When bj→eγ,eγ→it≤mγ−2, any Πp∈Ph vp→eγ is 0, so










v


e
γ


i






v

j


e
γ





=

0
.





When bj→eγ,eγ→it≤mγ−1, exactly and only one combination makes Πp∈Ph/jvp→eγ=1, then



















P
~

h



P


e
γ

/
i


m
γ










P
~

h



P
h







j



P
~

h








(

1
-







p



P
~

h





v

p


e
γ





)


=
1

,













P
h



P


e
γ

/
i


m
γ








j


P
h








(

1
-







p


P
h





v

p


e
γ





)


=
1

,


so






v


e
γ


i






v

j


e
γ






=
1.





Therefore:













v


e
γ


i






v

j


e
β








V

i



=

{



1





e
β

=

e
γ


,

j

i

,


b


j


e
γ


,


e
γ


i


t

=


m
γ

-
1







0


otherwise








(
11
)







As a result, the specific form of the Jacobian matrix Jcustom-characterVt is:










JG



V

t



=

(



0



M
t






I
t



0



)





(
12
)







Where








t

=


{





e
β


j

,

i


e
γ



t

}

=


{




v

i


e
γ







v


e
β


j




}




V

t








is a non-backtracking matrix, and







𝒥
t

=



{

𝒥


j


e
β


,


e
γ


i



}




V

t



=


{




v


e
γ


i






v

j


e
β





}




V

t








is a subcritical non-backtracking matrix. For the convenience of subsequent iterative derivation, the non-backtracking matrix custom-character and the subcritical non-backtracking matrix custom-character are extended to higher dimensions:









{





M


e
β



jie
γ


t

=


(

1
-

n
i


)



H

ie
γ




H

je
β





δ
ij

(

1
-

δ


e
β



e
γ




)



M


e
β



iie
γ


t









I


je
β



e
γ


i

t

=


H

ie
γ




H

je
β





δ


e
β



e
γ



(

1
-

δ
ij


)



I


je
γ



e
γ


i

t










(
13
)







Where custom-character represents an element with a subscript (eβ,j, i, eγ) in a 4-dimensional tensor {custom-character}M×N×N×M; when i=j, δij=1; otherwise, δij=0. custom-character represents an element with a subscript (j, eβ, eγ, i) in a 4-dimensional tensor {custom-character}N×M×M×N; H={Hieγ}S×S is an incidence matrix; when the node i is incident with the hyperedge eγ, Hieγ=1; otherwise, Hieγ=0. When aeβ→i,i→eγt=0, Meβiieγt=1; otherwise, Meβiieγt=0. Similarly, when bj→eγ,eγ→it=mγ−1, Ijeγeγit=1; otherwise, Ijeγeγit=0.


The fixed point iteration method is used to iterate formula (5) as follows:


When t=1, letting V0=n and V1=n+Jcustom-character×n:











[




v
1






v
2




]

1

=



[




n
1





0



]

+


[



0



M
0






I
0



0



]

[




n
1





0



]


=

[




n
1







I
0



n
1





]






(
14
)







A specific form of each element in formula (14) can be expressed as:









{





v

i


e
γ


1

=


n
i



H

ie
γ










v


e
γ


i

1

=


H

ie
γ






j



n
j




H

je
γ


(

1
-

δ
ij


)



I


je
γ



e
γ


i

0












(
15
)







A 1-norm of an activation probability v is used to measure a final activation scale in hypergraphs, and the following formula is obtained after collation:















v




=






ie
γ



v

i


e
γ




+




ie
γ



v


e
γ


i










=






ie
γ




n
i



H

ie
γ




+




ie
γ




H

ie
γ






j



n
j




H

je
γ


(

1
-

δ
ij


)



I


je
γ



e
γ


i

0












=





i



n
i



k
i



+



i



n
i







e
γ




i







j





e
γ


/
i




I


ie
γ



e
γ


j

0












=




i



n
i

(


k
i

+





e
γ




i







j





e
γ


/
i




I


ie
γ



e
γ


j

0




)









(
16
)







By selecting the node i with the maximum kieγ∈∂iΣj∈∂eγ/i Iieγeγj0 value as a seed, the minimum number of seed sets can be selected to realize information spread maximization. Therefore, 1-order hypergraph collective influence is defined:











HCI
1

(
i
)

=


k
i

+





e
γ




i







j





e
γ


/
i




I


ie
γ



e
γ


j

0








(
17
)







Using the above method, when t=2 and V2=n+jcustom-character×V1, 2-order hypergraph collective influence can be derived:











HCI
2

(
i
)

=


k
i

+





e
γ




i







j





e
γ


/
i




I


ie
γ



e
γ


j

1



+






e
γ




i







j





e
γ


/
i





I


ie
γ



e
γ


j

0







e
μ





j

/

e
γ






(

1
-

n
j


)



M


e
γ



jje
μ


1











(
18
)







Similarly, n-order hypergraph collective influence can be derived as:











HCI
n

(
i
)

=


k
i

+




L


A
n




O
L
n


+




L


B
n




E
L
n







(
19
)







Where An={x∈N+|x mod 2=0,x≤n} and Bn={x∈N+|x mod 2=1, x≤n}.











𝕆
L
n

=







e
γ




i









i
1






e
γ


/
i





I


ie
γ



e
γ



i
1



n
-
L








e

γ
1







i
1


/

e
γ






(

1
-

n

i
1



)



M


e
γ



i
1



i
1



e

γ
1




n
-
L
+
1


×


×








e

γ








i



/

e

γ


-
1









(

1
-

n

i




)



M

e

γ


-
1




n
-
1




i




i




e

γ











,







𝔼
L
n

=







e
γ




i











i
1






e
γ


/
i






I


ie
γ



e
γ



i
1



n
-
L










e

γ
1







i
1


/

e
γ







(

1
-

n

i
1



)




M


e
γ



i
1



i
1



e

γ
1




n
-
L
+
1


×


×







i
ι






e

γ
ι



/

i
ι







I


i
ι



e

γ
ι




e

γ
ι




i

ι
+
1




n
-
1


.














Where i1 represents a 1-layer neighbor node of the node i; custom-character represents an custom-character-layer neighbor node of the node i; eγ1 represents a 1-layer neighbor hyperedge of the hyperedge eγ; custom-character represents an custom-character-layer neighbor hyperedge of the hyperedge eγ;


Further, a design method of the HCI-TM algorithm (a greedy algorithm for selecting a set with the maximum influence) in step 3 is as follows:

    • Step 1: initializing a seed set S=Ø, and calculating the n-order hypergraph collective influence HCIn(i) of all nodes in the hypergraph.
    • Step 2: selecting the node i with the maximum n-order hypergraph collective influence as a seed, adding the seed into a seed set S={i}∪S, and then conducting spread based on the hypergraph threshold rule;
    • Step 3: judging whether the node activation ratio in the hypergraph exceeds ar; if the node activation ratio exceeds ar, the maximum influence set in this condition is S;
    • Step 4: if the node activation ratio does not exceed ar, recalculating the n-order hypergraph collective influence HCIn(i) of a ┌n/2┐-layer neighbor node of an active node, and repeating step 2 to step 3.


Robustness and effectiveness of the HCI-TM algorithm are verified on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5. Specific steps are as follows:

    • Step 1: using a configuration model to generate the Erdös-Rényi hypergraphs, the scale-free hypergraphs and the uniform hypergraphs with different parameter settings.
    • Step 2: using the HCI-TM algorithm and other contrast algorithms to select seed sets on different types of hypergraphs, conducting spread based on the threshold rule, and recording the activation scale in hypergraphs at a fixed interval (ratio of seed nodes).
    • Step 3: making a comparison with other algorithms to evaluate the effectiveness of the HCI-TM algorithm: when a same ratio of seed nodes is selected, the algorithm with a larger activation scale in hypergraphs is more effective.


By analyzing the evolution of the maximum HCI value in the hypergraphs in a seed selection process, it is verified that the index of hypergraph collective influence can be used as a criterion for predicting the occurrence of cascade phenomenon. Specific steps thereof are as follows:

    • Step 1: using the HCI-TM algorithm to select the seed nodes on the hypergraphs, and recording the HCI value of the selected seed nodes (the maximum HCI value in the hypergraphs at the moment).
    • Step 2: analyzing the figure of the evolution of the HCI value with the selection of the seed nodes; the index of HCI can be used as a criterion for predicting the occurrence of cascade phenomenon: the peak value of HCI is corresponding to a phase transformation point where the cascade phenomenon occurs; by observing the evolution of the maximum HCI value in the hypergraphs, whether and when the cascade phenomenon will occur in a hypergraph spreading process can be judged.


Compared with the prior art, the present invention has the following beneficial effects:

    • (1) The present invention popularizes a Message Passing theoretical analytical framework from the graphs to the hypergraphs, uses the hypergraph threshold model to simulate the spread of information in real scenarios such as the marketing network, and proposes a multi-node collective influence measurement method by analyzing the self-satisfying equation that satisfies the threshold rule.
    • (2) The present invention designs a greedy algorithm for selecting a set with the maximum influence (the HCI-TM algorithm) based on a hypergraph collective influence measurement method; the algorithm has strong robustness and effectiveness, and is applied in the marketing network.
    • (3) The present invention provides the criterion for predicting the occurrence of hypergraph cascade phenomenon based on the concept of hypergraph collective influence.





DESCRIPTION OF DRAWINGS


FIG. 1 is a flow chart of the present invention.



FIG. 2 is a schematic diagram of a hypergraph threshold rule of the present invention; ovals represent hyperedges, and circles represent nodes.



FIG. 3 is a schematic diagram of hypergraph collective influence of the present invention; a is a schematic diagram of 1-order hypergraph collective influence, and b is a schematic diagram of 2-order hypergraph collective influence. Circles represent nodes, triangles represent hyperedges, connected edges indicate that incidence relationships exists between the hyperedges and the nodes, triangles with rhombuses indicate that the hyperedges are in a subcritical state, i.e., in a hyperedge eγ, mγ−1 nodes have already been activated.



FIG. 4 shows the performance of a HCI-TM algorithm of the present invention and other algorithms on generated hypergraphs under different parameter settings; each horizontal axis q is the ratio of seed nodes, and each vertical axis Q(q) is the activation ratio of nodes in the hypergraphs when the seed nodes with a ratio q is selected. a-d represent the performance of the algorithms on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3 and the hypergraph scales of 10,000 and 100,000; e-h represent the performance of the algorithms on scale-free hypergraphs with the power-law exponents of 1.5 and 2 and the hypergraph scales of 10,000 and 100,000; i−1 represent the performance of the algorithms on uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5 and the hypergraph scales of 10,000 and 100,000.



FIG. 5 shows actual time complexity analysis of a HCI-TM algorithm of the present invention and other algorithms on hypergraphs. Each horizontal axis is log10 N, where N is the hypergraph scale. Each vertical axis is log10 T, where T is the operating time. a and d represent the actual time complexity analysis on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, respectively; b and e represent the actual time complexity analysis on scale-free hypergraphs with the power-law exponents of 1.5 and 2; and c and f represent the actual time complexity analysis on uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5.



FIG. 6 is the figure of the evolution of the maximum HCI value in hypergraphs of the present invention with the selection of seed nodes. a represents the figure of the evolution of the maximum HCI value in hypergraphs on an Erdös-Rényi hypergraph with the average hyperdegree of 3 and the hypergraph scale of 10,000; b represents the figure of the evolution of the maximum HCI value in hypergraphs on a scale-free hypergraph with the power-law exponent of 1.5 and the hypergraph scale of 10,000; c represents the figure of the evolution of the maximum HCI value in hypergraphs on a uniform hypergraph with the cardinality of hyperdegree of 5 and the hypergraph scale of 10,000.



FIG. 7 shows the performance of a HCI-TM algorithm of the present invention and other contrast algorithms on a marketing network (a Las Vegas bar review dataset).





DETAILED DESCRIPTION

In order to describe the present invention more specifically, the technical solution of the present invention will be described below in detail in combination with the drawings and specific embodiments.


Firstly, in a study process of the node collective influence measurement method of the hypergraph threshold model, the solution uses the hypergraph threshold model to simulate the information spreading rule in real scenarios such as the marketing network, each node represents a customer or individual in the real scenarios, each hyperedge represents a high-order interaction relationship between individuals, and a specific rule of the hypergraph threshold model is that when the node activation ratio in a hyperedge exceeds the hyperedge threshold, the hyperedge will be activated; when a hyperedge is activated, all nodes in the hyperedge will be activated.



FIG. 1 is taken as an example to describe a hypergraph threshold rule. All hyperedge thresholds in the hypergraph are set to 0.5; when the initial t=0, node 3 is selected as a seed node; at this time, an active node and hyperedge set U(0)={3}. When t=1, the hyperedge eγ2 and the hyperedge eγ3 reach the thresholds and are activated, and U(1)={3, eγ2, eγ3}. When t=2, the activation of the hyperedge eγ2 and the hyperedge eγ3 causes node 2 and node 6 to be activated, and U(2)={3, eγ2, eγ3, 2,6}. When t=3, the hyperedge eγ1 and the hyperedge eγ5 are activated, and U(3)={3, eγ2, eγ3, 2,6, eγ1, eγ5}. At the next moment, node 1 and node 7 are activated, and U(4)={3, eγ2, eγ3, 2,6, eγ1, eγ5, 1,7}. Finally, no new hyperedge or node can be activated, and the spreading process is stopped.


Secondly, a Cavity method is used to establish a conditional probability self-satisfying equation based on the hypergraph threshold rule, thus to weaken strong correlation between the nodes and the hyperedges and more accurately describe an information spreading rule in a hypergraph. With respect to the self-satisfying equation, a condition for activating node i is that the node will be activated when any hyperedge incident with the node other than a hyperedge eγ is activated, and a condition for activating the hyperedge eγ is that the hyperedge will be activated when any mγ nodes incident with the hyperedge eγ are activated:









?




(
1
)










?

indicates text missing or illegible when filed




Final states of the node i and the hyperedge eγ can be calculated by the following formula:









{





v
i

=


n
i

+


(

1
-

n
i


)

[

1
-





e
β




i




(

1
-

v


e
β


i



)



]









v

e
γ


=

1
-





P
h



P

e
γ


m
γ





(

1
-




p


P
h




v

p


e
γ





)











(
2
)







In formula (2), vi represents the final state of the node i, and veγ represents the final state of the hyperedge eγ.


To simplify formula (1), letting V={v1, v2}T, where vi {Vi→eγ}S×1, and v2={veγ→i}S×1. S=Σi=1N ki represents the sum of hyperdegrees of all nodes. Similarly, n is popularized to higher dimensions










n


=



(


n
1

,
0

)

T

=


(






,




n
i

,


,

n
i





k
i


,




S


,



0
,


,
0



S


)

T






(
3
)







Therefore, formula (1) can be simplified as:










V


=



n


+

G

(

V


)




{





v
1

=


n
1

+


g
1

(

v
2

)









v
2

=

0
+


g
2

(

v
1

)












(
4
)







As formula (3) is composed of complex nonlinear equations, which cannot be solved directly, linearization and a fixed point iteration method are used to solve formula (3):










V


t
+
1


=



n


+
JG


|

V

t



×

V

t







(
5
)







A specific form of a Jacobian matrix Jcustom-character|Vt is solved as follows:










J


=


(







g
1





v
1









g
1





v
2











g
2





v
1









g
2





v
2






)


2

S
×
2

S






(
6
)







A partial derivative of g1 is calculated:













v

i


e
γ







v

j


e
β





=
0




(
7
)

















v

i


e
γ







v


e
β


j





|

V

t



=

{




1
-

n
i






i
=
j

,


e
β



e
γ


,


a



e
β


i

,

i


e
γ



t

=
0






0


otherwise








(
8
)







Where aeβ→i,i→eγteμ∈∂i/(eγ,eβ)veμ→it.


A partial derivative of g2 is calculated as follows:













v


e
γ


i






v


e
β


j




=
0




(
9
)







When eβ=eγ, j≠i:













v


e
γ


i






v

j


e
γ





=






P
h



P

e

γ
/
i



m
γ





j


P
h





(

1
-




p


P
h




v

p


e
γ





)








P
h



P

e

γ
/
i



m
γ





j


P
h




[




p



P
h

/
j





v

p


e
γ












P
~

h



P

e

γ
/
i



m
γ







P
~

h



P
h




j



P
~

h




(

1
-




p



P
~

h




v

p


e
γ





)




]








(
10
)







Letting bj→eγ, eγ→it=p∈∂eγ/(i,j) vp→eγt represent the number of nodes activated at time t in eγ after nodes i and j are removed. When bj→eγ,eγ→it≤mγ, Πp∈Ph vp→eγ=1 certainly exists, so










v


e
γ


i






v

j


e
γ





=
0.




When beβ→i,i→eγt≤mγ−2, any Πp∈Ph vp→eγ=1 is 0, so










v


e
γ


i






v

j


e
γ





=
0.




When beβ→i,i→eγt≤mγ−1, exactly and only one combination makes Πp∈Ph vp→eγ=1, then















P
_

h



P

e

γ
/
i



m
γ






P
_

h



P
h




j



P
_

h





(

1
-




p



P
_

h




v

p


e
γ





)


=
1

,













P
h



P

e

γ
/
i



m
γ




j


P
h





(

1
-




p


P
h




v

p


e
γ





)


=
1

,


so






v


e
γ


i






v

j


e
γ






=
1.





Therefore:













v


e
γ


i






v

j


e
β






|

V

t



=

{



1





e
β

=

e
γ


,

j

i

,


b


j


e
γ


,


e
γ


i


t

=


m
γ

-
1







0


otherwise








(
11
)







As a result, the specific form of the Jacobian matrix Jcustom-character|Vt is:










JG

|

V

t



=

(



0



M
t






I
t



0



)





(
12
)







Where







t

=


{




e
β


j

,

i


e
γ



t

}

=


{




v

i


e
γ







v


e
β


j




}


|

V

t








is a non-backtracking matrix, and







t

=



{



j


e
β


,


e
γ


i



}


|

V

t



=


{




v

e

γ

i







v

j


e
β





}


|

V

t








is a subcritical non-backtracking matrix. For the convenience of subsequent iterative derivation, the non-backtracking matrix custom-character and the subcritical non-backtracking matrix custom-character are extended to higher dimensions:









{





M


e
β



jie
γ


t

=


(

1
-

n
i


)



H

ie
γ




H

j


e
β






δ
ij

(

1
-

δ


e
β



e
γ




)



M


e
β



iie
γ


t









I


je
β



e
γ


i

t

=


H

ie
γ




H

j


e
β






δ


e
β



e
γ



(

1
-

δ
ij


)



I


je
γ



e
γ


i

t










(
13
)







Where H={Hieγ}S×S is an adjacency matrix; when aeβ→i,i→eγt=0, Meβiieγt, =1; otherwise, Meβiieγt=0. Similarly, when bj→eγ,eγ→it=mγ−1, Ij→eγeγ→it=1; otherwise, Ij→eγeγ→it=0.


The fixed point iteration method is used to iterate formula (5) as follows:


When t=1, letting V0=n and V1=n+Jg0×n:











[




v
1






v
2




]

1

=



[




n
1





0



]

+


[



0



M
0






I
0



0



]

[




n
1





0



]


=

[




n
1







I
0



n
1





]






(
14
)







A specific form of each element in formula (14) can be expressed as:









{





v

i


e
γ


1

=


n
i



H

ie
γ










v


e
γ


i

1

=


H

ie
γ






j



n
j




H

je
γ


(

1
-

δ
ij


)



I


je
γ



e
γ


i

0












(
15
)







A 1-norm of an activation probability v is used to measure a final activation scale in hypergraphs, and the following formula is obtained after collation:















v




=






i


e
γ




v

i


e
γ




+




i


e
γ




v


e
γ


i










=






i


e
γ





n
i



H

i


e
γ





+




i


e
γ





H

i


e
γ







j



n
j




H

j


e
γ



(

1
-

δ

i

j



)



I

j


e
γ



e
γ


i

0












=





i



n
i



k
j



+



i



n
i







e
γ




i







j





e
γ


/
i




I

i


e
γ



e
γ


j

0












=




i



n
i

(


k
j

+





e
γ




i







j





e
γ


/
i




I

i


e
γ



e
γ


j

0




)









(
16
)







By selecting the node i with the maximum kieγ∈∂iΣj∈∂eγ/i Iieγeγj0 value as a seed, the minimum number of seed sets can be selected to realize information spread maximization. Therefore, 1-order hypergraph collective influence is defined:











HCI
1

(
i
)

=


k
i

+








e
γ




i








j





e
γ


/
i




I

i


e
γ



e
γ


j

0








(
17
)







As shown in a of FIG. 2, the hyperdegree of node i is ki=5, and Σeγ∈∂iΣj∈∂eγ/i Iieγeγj0=3 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 2 is 3, so the 1-order hypergraph collective influence of the node i is HCI1(i)=8.


Using the above method, when t=2 and V2=n+Jg1×V1, 2-order hypergraph collective influence can be derived at this time:










H

C



I
2

(
i
)


=


k
i

+





e
γ




i







j





e
γ


/
i




I

i


e
γ



e
γ


j

1



+





e
γ




i







j





e
γ


/
i





I

i


e
γ



e
γ


j

0







e
μ





j

/

e
γ







(

1
-

n
j


)



M


e
γ


j

j


e
μ


1











(
18
)







As shown in b of FIG. 2, the hyperdegree of node i is ki=5, Σeγ∈∂iΣj∈∂eγ/i Iieγeγj0=3 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 2 is 3, and Σeγ∈∂iΣj∈∂eγ/i Iieγeγj0 Σeμ∈∂j/eγ(1−nj)Meγjjeμ1=4 in the figure represents that the number of bold paths (subcritical paths) starting from the node i and having a length of 3 is 4, so the 2-order hypergraph collective influence of the node i is HCI1(i)=12.


Similarly, n-order hypergraph collective influence can be derived as:










H

C



I
n

(
i
)


=


k
i

+




L


A
n




O
L
n


+




L


B
n




E
L
n







(
19
)







Where An={x∈N+|x mod 2=0,x≤n} and Bn={x∈N+|x mod 2=1,x≤n};











𝕆
L
n

=







e
γ




i









i
1






e
γ


/
i





I


ie
γ



e
γ



i
1



n
-
L








e

γ
1







i
1


/

e
γ






(

1
-

n

i
1



)



M


e
γ



i
1



i
1



e

γ
1




n
-
L
+
1


×


×








e

γ








i



/

e

γ


-
1









(

1
-

n

i




)



M

e

γ


-
1




n
-
1




i




i




e

γ











,








𝔼
L
n

=







e
γ




i











i
1






e
γ


/
i






I


ie
γ



e
γ



i
1



n
-
L










e

γ
1







i
1


/

e
γ







(

1
-

n

i
1



)




M


e
γ



i
1



i
1



e

γ
1




n
-
L
+
1


×


×







i
ι






e

γ
ι



/

i
ι






I


i
ι



e

γ
ι




e

γ
ι




i

ι
+
1




n
-
1









;







Where i1 represents a 1-layer neighbor node of the node i; custom-character represents an custom-character-layer neighbor node of the node i; eγ1 represents a 1-layer neighbor hyperedge of the hyperedge eγ; custom-character represents an custom-character-layer neighbor hyperedge of the hyperedge eγ.


In the part of HCI-TM algorithm design and numerical simulation evaluation, firstly, a greedy algorithm for selecting a set with the maximum influence (the HCI-TM algorithm) is designed based on a hypergraph collective influence measurement method. Specific steps are as follows:

    • Step 1: initializing a seed set S=Ø, and calculating the n-order hypergraph collective influence HCIn(i) of all nodes in the hypergraph.
    • Step 2: selecting the node i with the maximum hypergraph collective influence as a seed, adding the seed into a seed set S={i}∪S, and then conducting spread based on the hypergraph threshold rule.
    • Step 3: judging whether the node activation ratio in the hypergraph exceeds ar; if the node activation ratio exceeds ar, the maximum influence set in this condition is S.
    • Step 4: if the node activation ratio does not exceed ar, recalculating the n-order hypergraph collective influence HCIn(i) of a ┌n/2┐-layer neighbor node of an active node, updating the state of the hyperedge (active state, subcritical state and non-subcritical state, wherein the active state indicates that the hyperedge is activated, the subcritical state indicates that the hyperedge will be activated if one more node in the hyperedge is activated, and other states become the non-subcritical state), and repeating step 2 to step 3.


Secondly, robustness and effectiveness of the HCI-TM algorithm are verified on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5; in order to ensure the reliability, the average of 10 independent experiments is taken as an experimental result. Specific steps are as follows:

    • Step 1: using a configuration method to generate 10 Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5, and the node scales of 5,000, 10,000, 20,000, 30,000, 50,000 and 100,000, respectively.
    • Step 2: using the HCI-TM algorithm and other contrast algorithms to select seed sets on different scales and types of hypergraphs, conducting spread based on the threshold rule, and recording the activation scale in hypergraphs at a fixed interval (ratio of seed nodes).
    • Step 3: making a comparison with other algorithms to evaluate the effectiveness of the HCI-TM algorithm: when a same ratio of seed nodes is selected, the algorithm with a larger activation scale in hypergraphs is more effective.



FIG. 3 shows the performance of the HCI-TM algorithm and other algorithms (high degree algorithm (HD), high degree adaptive algorithm (HDA), neighbor preference algorithm (NP), neighbor preference adaptive algorithm (NPA), PageRank algorithm (PR) and Random algorithm (RA)) on different scales and types of Erdös-Rényi hypergraphs, scale-free hypergraphs and uniform hypergraphs; the performance of the HCI-TM algorithm is better than that of other algorithms, and fewer seed nodes can be selected to achieve a larger activation range. At the same time, a 2-order HCI-TM algorithm is superior to a 1-order algorithm, which proves that a higher-order HCI-TM algorithm has better performance. The robustness and practicality of the HCI-TM algorithm are verified.


Recording operating time of different algorithms on Erdös-Rényi hypergraphs with the average hyperdegrees of 2 and 3, scale-free hypergraphs with the power-law exponents of 1.5 and 2, and uniform hypergraphs with the cardinalities of hyperdegree of 4 and 5, and the node scales of 5,000, 10,000, 20,000, 30,000, 50,000 and 100,000, respectively. Taking log10 N as the horizontal axis (N is hypergraph scale). Each vertical axis is log10 T, where T is the operating time. The broken line graph shown in FIG. 4 is drawn and is processed by one order polynomial fitting. It is found that the time complexity of the HCI-TM algorithm is near linear in actual operation.


Finally, an evolution relationship of the maximum HCI value in a seed selection process is analyzed on the Erdös-Rényi hypergraph with the average hyperdegree of 3, the scale-free hypergraph with the power-law exponent of 1.5, and the uniform hypergraph with the cardinality of hyperdegree of 5, and the node scale of 10,000. It is verified that the index of hypergraph collective influence can be used as a criterion for predicting the occurrence of cascade phenomenon. Specific steps thereof are as follows:

    • Step 1: using the HCI-TM algorithm to select the seed nodes on the hypergraphs, and recording the HCI value of the selected seed nodes (the maximum HCI value in the hypergraphs at the moment).
    • Step 2: analyzing the figure of the evolution of the HCI value with the selection of the seed nodes, the index of HCI can be used as a criterion for the occurrence of cascade phenomenon: the peak value of HCI is corresponding to a phase transformation point where the cascade phenomenon occurs; by observing the evolution of the maximum HCI value in the hypergraphs, whether and when explosive information spread (the cascade phenomenon) will occur in a hypergraph spreading process can be judged.


In the figure of the evolution of the maximum HCI value in hypergraphs with the selection of seeds, as shown in FIG. 5, each horizontal axis represents the number of seed nodes, and each vertical axis represents the maximum HCI value in the hypergraphs at the moment. Compared with FIG. 3, it is found that the peak value of hypergraph collective influence (HCI) is corresponding to a phase transformation point where the cascade phenomenon occurs, and an emergent behavior is induced by the cascade phenomenon among the nodes in the hypergraphs; therefore, the index of hypergraph collective influence can be used as an effective criterion for the occurrence of emergent phenomenon. At the same time, the peak advance of the 2-order hypergraph collective influence and the 1-order hypergraph collective influence also confirm that the higher-order HCI-TM algorithm has a better effect.


In the part of algorithm application study, the marketing network is taken as an example, the hypergraph threshold model is used to simulate spread of reputation among customers, and the HCI-TM algorithm is applied to the real scenario of the marketing network (the Las Vegas bar review dataset) to identify a most recommendable customer set in the marketing network.

    • Step 1: modeling a hypergraph, wherein the hypergraph is constructed by intercepting Las Vegas bar review for one month from Yelp Kaggle competition data. The hypergraph has 1,234 nodes and 1,194 hyperedges, wherein the nodes represent customers, and the hyperedges represent different cocktail categories. If a customer reviews a cocktail, an incidence relationship exists between the customer (a node) and the cocktail (a hyperedge).
    • Step 2: using the hypergraph threshold model to simulate the spread of customer reputation in a marketing network, and setting a hyperedge threshold to 0.6 according to the principle of majority rule, i.e., among reviews of a cocktail, if 60% of customer reviews are positive, the cocktail is considered recommended, and the overall customer rating of the bar will also become positive.
    • Step 3: using the HCI-TM algorithm to screen out the most recommendable customer group for popularization, drawing a diagram of relationships between the ratio of selected recommendable groups and the ratio of customers with positive reviews, and making a comparison with other algorithms.
    • Step 4: evaluating the effect of the algorithms, and selecting a same number of recommendable customers for popularization; the higher the ratio of customers with positive reviews is, the better the effect of an algorithm is, the more the marketing cost can be saved.


The performance of the HCI-TM algorithm and other contrast algorithms on the Las Vegas bar review dataset is shown in FIG. 6. The HCI1-TM algorithm and a HCI2-TM algorithm only need to select 82 and 49 customers for popularization, and the ratio of customers with positive reviews can reach 90%. To achieve the above goals, a HHDA algorithm and the NPA algorithm need to select 124 and 136 customers for popularization, respectively. The HCI-TM algorithm can be used for accurately identifying the most recommendable customer group, which is of great significance for reducing the marketing cost.


The above-mentioned technical solution only reflects a preferred technical solution of technical solutions of the present invention. Some changes made to certain parts by those skilled in the art all reflect the principle of the present invention, and belong to the protective scope of the present invention.

Claims
  • 1. A method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model, comprising the following steps: step 1: establishing a hypergraph threshold model based on sales data first;step 2: using the hypergraph threshold model to simulate spread of customer reputation in a marketing network, and setting a hyperedge threshold according to a specific scenario;step 3: using an HCI-TM algorithm to screen out a most recommendable customer group for popularization, and using a hypergraph threshold rule to simulate spread of information among customers.
  • 2. The method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model according to claim 1, wherein a process of establishing the hypergraph threshold model in step 1 is as follows: in the hypergraph threshold model, a node represents a customer, a hyperedge represents an interaction relationship between the customer and a product, and one hyperedge represents an interaction relationship between all customers who have evaluated a certain product and the product; a specific rule of the hypergraph threshold model is that when the node activation ratio in the hyperedge exceeds the hyperedge threshold, the hyperedge will be activated; when the hyperedge is activated, all nodes in the hyperedge will be activated;a Cavity method is used to establish a conditional probability self-satisfying equation based on the hypergraph threshold rule, thus to weaken strong correlation between the nodes and the hyperedge and accurately describe an information spreading rule in a hypergraph; the self-satisfying equation described in a tree-like hypergraph is expressed as formula (1), i.e., a condition for activating a node i is that: the node will be activated when any hyperedge incident with the node other than a hyperedge eγ is activated, and a condition for activating the hyperedge eγ is that: the hyperedge will be activated when any mγ nodes incident with the hyperedge eγ other than the node i are activated:
  • 3. The method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model according to claim 1, wherein a method for calculating hypergraph collective influence is as follows: to simplify formula (1), letting V→={v1, V2}T, where v1={vi→eγ}S×1, and v2={vi→eγ}S×1; S=Σi=1N ki represents the sum of hyperdegrees of all nodes; similarly, n is popularized to higher dimensions
  • 4. The method for identifying an optimal influence customer group in a marketing network based on a hypergraph threshold model according to claim 1, wherein a design method of the HCI-TM algorithm in step 3 is as follows: step 1: initializing a seed set S=Ø, and calculating the n-order hypergraph collective influence HCIn(i) of all nodes in the hypergraph;step 2: selecting the node i with the maximum n-order hypergraph collective influence as a seed, adding the seed into a seed set S={i}∪S, and then conducting spread based on the hypergraph threshold rule;step 3: judging whether the node activation ratio in the hypergraph exceeds ar; if the node activation ratio exceeds ar, the maximum influence set in this condition is S;step 4: if the node activation ratio does not exceed ar, recalculating the n-order hypergraph collective influence HCIn(i) of a ┌n/2┐-layer neighbor node of an active node, and repeating step 2 to step 3.
Priority Claims (1)
Number Date Country Kind
202311589131.3 Nov 2023 CN national