The present invention relates to a detection device, a detection method, and a detection program.
A method of extending graph signal processing, which has been mainly applied to an undirected graph, so as to be applicable to a directed graph has been proposed (e.g., refer to Non Patent Literature 1). For example, according to the method described in Non Patent Literature 1, graph signal processing such as graph Fourier transform, graph filtering, or graph wavelet transform can be performed on a directed graph.
Moreover, a technique called Sybil detection for detecting a malicious node included in a network by analyzing graph data based on an actual network is known. For example, Sybil detection is used for detection of a botnet, detection of a spam user in a social networking service (SNS), and the like.
There are a plurality of techniques for Sybil detection, such as random walk and belief propagation (BP). On the other hand, a method of applying graph signal processing to Sybil detection has not been proposed so far. Therefore, it is difficult to interpret a plurality of Sybil detection techniques in a common signal processing framework.
For example, if Sybil detection techniques can be compared in a common signal processing framework, knowledge that cannot be obtained only by comparison in a superficial framework such as accuracy and extensibility can be obtained, and development of a new technique and improvement of an existing technique can be expected.
In order to solve the above-described problem and achieve the object, a detection device is characterized by including: an update unit that updates an evaluation value of a vertex of a graph by random walk, on the basis of a matrix generated by using an argument on a complex plane expressing a direction of a side of the graph in which at least some vertices are labeled; and an estimation unit that estimates a label of a vertex of the graph on the basis of the evaluation value.
It is possible with the present invention to interpret a Sybil detection technique in a signal processing framework.
The following description will explain an embodiment of a detection device, a detection method, and a detection program according to the present application in detail with reference to the drawings. Note that the present invention is not limited to the embodiment described below.
First, a configuration of a detection device according to a first embodiment will be described with reference to
Here, in the present embodiment, it is assumed that the detection device 10 detects Sybil. Sybil is a user created for malicious actions such as spam, click fraud, phishing, and impersonation to others, and is a security threat that deteriorates the quality of SNS or review sites.
In Sybil detection, a relationship between users may be focused. For example, on Twitter (registered trademark), there is little motivation for authorized users to connect with Sybil who sends spam or the like. Therefore, authorized users are expected to be closely connected with each other to form a community structure. On the other hand, since the number of followers strongly affects the influence and reliability of an account, Sybil is expected to be connected with Sybil to increase the number of followers and form a community structure. Therefore, it is considered that it is possible to distinguish between an authorized user and Sybil by appropriately separating a community structure of authorized users and a community structure of Sybil.
As a Sybil detection technique, a technique of setting a prior evaluation value on the basis of a known label given to a graph with a user as a vertex, and determining whether the user is Sybil or not on the basis of an evaluation value of an unknown vertex determined by locally updating and propagating the evaluation value of the vertex is known. Moreover, Non Patent Literature 1 discloses a method of applying graph signal processing to a directed graph.
Therefore, the detection device 10 of the present embodiment performs Sybil detection using graph signal processing. As a result, for example, the existing Sybil detection technique can be reinterpreted in a graph signal processing manner. Moreover, the reinterpretation result is considered to be useful for development of a new Sybil detection technique and improvement of an existing Sybil detection technique.
The Sybil detection problem described above can be regarded to be a semi-supervised problem that estimates unknown vertex labels from known vertex labels. At this time, it is assumed that a signal value +1 is assigned to a known vertex label in the case of Sybil and a signal value −1 is assigned in the case of an authorized user (not Sybil), and 0 is assigned to a vertex having an unknown label. Then, the Sybil detection problem can be interpreted as a problem of restoring a true graph signal when a graph signal in which some signal values are missing (0) is given.
In accordance with this observation, the Sybil detection technique based on random walk is formulated as filtering in the present embodiment. As a result, the existing Sybil detection technique can be integrated and reinterpreted in a graph signal processing manner.
The graph data 20 is data representing a graph by a predetermined method. In the present embodiment, the graph data 20 is represented by an adjacency matrix. For example, an undirected graph is expressed by an adjacency matrix as illustrated in
Here, an adjacency matrix representing the graph data 20 is defined as follows. First, in a case where there is no side between vertices of the graph, the component of the adjacency matrix corresponding to the side is set to 0. Next, in a case where an undirected side exists between vertices of the graph, the component of the adjacency matrix corresponding to the side is set to 1. Moreover, in a case where there is a directed side from an arbitrary vertex i to a vertex j of the graph, the (i, j) component of the adjacency matrix is set to 1, and the (j, i) component is set to 0.
For example, in the undirected graph of
Moreover, in the directed graph of
In general, an asymmetric matrix is difficult to handle algebraically compared to a symmetric matrix, and therefore, many graph analysis techniques including graph signal processing are limited in application to an undirected graph. Note that the graph data 20 may be any data as long as the data expresses a graph.
For example, the graph data 20 may represent a follow/follower relationship (side) of a user (vertex) on Twitter (registered trademark) as a graph, or may represent a function calling relationship in a malware execution code as a graph. Moreover, since the analysis technique of the present embodiment is obtained by extending the graph analysis technique of the undirected graph to the directed graph, the analysis is also applicable to an undirected graph.
The detection device 10 can execute Sybil detection on a directed graph. For example, the analysis result 30 is a label indicating whether a user is Sybil or not for each user corresponding to each vertex of the graph data 20.
Here, each unit of the detection device 10 will be described. As illustrated in
The communication unit 11 performs data communication with other devices via a network. For example, the communication unit 11 is a network interface card (NIC). The input unit 12 accepts an input of data from a user. The input unit 12 is, for example, an input device such as a mouse or a keyboard. The output unit 13 outputs data by displaying a screen or the like. The output unit 13 is, for example, a display device such as a display.
The storage unit 14 is a storage device such as a hard disk drive (HDD), a solid state drive (SSD), or an optical disk. Note that the storage unit 14 may be a data-rewritable semiconductor memory such as a random access memory (RAM), a flash memory, or a non-volatile static random access memory (PTVSRAM). The storage unit 14 stores an operating system (OS) and various programs to be executed by the detection device 10.
The control unit 15 controls the entire detection device 10. The control unit 15 is, for example, an electronic circuit such as a central processing unit (CPU) or a micro processing unit (MPU), or an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Moreover, the control unit 15 has an internal memory for storing programs and control data that define various processing procedures, and executes each processing using the internal memory. Moreover, the control unit 15 functions as various processing units by various programs operating. For example, the control unit 15 has a conversion unit 151, a generation unit 152, a calculation unit 153, a signal processing unit 154, and an estimation unit 155.
The conversion unit 151 converts the direction of a side between vertices of the graph into an argument (phase) on a complex plane. For example, the conversion unit 151 converts the direction of a side into a first angle in a case where the direction of the side between vertices of the graph is the first direction, converts the direction of a side into an angle obtained by inverting the sign of the first angle in a case where the direction of the side is opposite to the first direction, and converts the direction of a side into 0 (angle) in a case where the side has no direction. Here, a method of conversion by the conversion unit 151 will be described with reference to
First, it is assumed that a point having an absolute value of 1 and an argument of 0 on a complex plane is given as a reference point. As illustrated in
As illustrated in
The operation by the conversion unit 151 mentioned above can be described as a function γ from a side set to a primary unitary group as expressed in Formula (1). Here, in Formula (1), i in an italic font is an index of a vertex, and i in a normal font is an imaginary unit.
Note that the definition of the function γ is not limited to that of Formula (1). For example, the function γ may be defined in the form of γ=α+iβ by explicitly dividing the real part and the imaginary part. Moreover, the function γ may be defined as a quadratic special orthogonal group, that is, as γ=diag(α, β) as a 2×2 matrix.
The generation unit 152 generates a Hermitian matrix expressing the relationship between vertices of the graph by using the argument converted by the conversion unit 151. For example, the generation unit 152 generates a matrix, in which each row and each column correspond to each vertex of the graph and which is obtained by subtracting a matrix in which a component having a side between corresponding vertices has an argument converted by the conversion unit 151 and the absolute value is a constant complex number, from an order matrix of the graph. In this case, the components of the matrix may be the values obtained by the function γ mentioned above.
Here, in graph signal processing, a graph is generally represented using a matrix called a graph Laplacian. The graph Laplacian can be defined using an adjacency matrix and an order matrix. The order of the graph expresses the number of sides extending from the vertex.
The graph Laplacian will be described with reference to
The generation unit 152 generates a matrix using the converted adjacency matrix and order matrix. The converted adjacency matrix is a matrix in which each component of the adjacency matrix is represented using the argument converted by the conversion unit 151.
As illustrated in
The (1, 2) component and the (2, 1) component of the matrix 20L are −eiθ and −e−iθ, respectively. Moreover, since an undirected side exists between vertices 3 and 4 of the graph, both the (3, 4) component and the (4, 3) component of the matrix 20L are −1. Note that, since the direction of the side is converted into the argument on the complex plane in the conversion unit 151, the order expressed in the matrix 20D is calculated ignoring the direction of the side of the directed graph.
Here, a matrix in which the (i, j) component and the (j, i) component of the matrix are complex conjugates for each other is referred to as a Hermitian matrix. Apparently, the matrix 20L in
The calculation unit 153 calculates an eigenvector of the Hermitian matrix generated by the generation unit 152. Moreover, the signal processing unit 154 regards the eigenvector calculated by the calculation unit 153 as a Fourier basis of the graph Laplacian, and performs graph signal processing.
For example, the signal processing unit 154 performs graph Fourier transform, graph filtering, or graph wavelet transform using eigenvectors. Moreover, the eigenvector calculated by the calculation unit 153 may be used in Sybil detection described later.
Here, the graph Fourier transform in the undirected graph is defined by regarding an eigenvector v of the graph Laplacian Lprior as a Fourier basis. When a matrix in which eigenvectors v are arranged in columns is V, graph Fourier transform for an arbitrary graph signal f is defined by {circumflex over ( )}f=V*f (where {circumflex over ( )}f means that {circumflex over ( )} is attached directly above f, and * indicates complex conjugate transposition or association). Most elemental techniques of graph signal processing in an undirected graph are based on this graph Fourier transform.
The signal processing unit 154 extends the graph Fourier transform in a conventional undirected graph and applies the same to a directed graph. The signal processing unit 154 executes two procedures of spectral decomposition of the Hermitian Laplacian L and extension of graph Fourier transform to a directed graph.
First, since L is a Hermitian matrix, the signal processing unit 154 performs spectral decomposition of L using a matrix Λ in which eigenvalues λ of L are arranged in diagonal components, and a unitary matrix U in which eigenvectors u are arranged in columns as expressed in Formula (2). Note that the eigenvector u is calculated by the calculation unit 153.
[Math. 2]
=UΛU* (2)
Moreover, the signal processing unit 154 can perform graph Fourier transform on a directed graph for an arbitrary graph signal f as in Formula (3) by regarding the eigenvector u as a Fourier basis.
[Math. 3]
{circumflex over (f)}=U*f (3)
Note that, although the extension method of graph Fourier transform has been described here, the signal processing unit 154 can also extend the elemental technology in graph signal processing such as graph filtering or graph wavelet transform to a directed graph in a similar manner.
In the present embodiment, the signal processing unit 154 performs graph signal processing for Sybil detection. Here, it is assumed that each vertex of the graph data 20 corresponds to an SNS user. Moreover, it is assumed that at least some vertices of the graph data 20 are labeled.
As described above, in the present embodiment, a Sybil detection technique based on random walk is formulated as filtering by graph signal processing. Here, the prior evaluation value of a vertex labeled as Sybil is set to +1, the prior evaluation value of a vertex labeled as an authorized user (not Sybil) is set to −1, and the prior evaluation value of a vertex whose label is unknown is set to 0.
First, in random walk, the evaluation value of each vertex is updated using the update formula of Formula (4).
Where pi(t) is the evaluation value at step t of the vertex i, and qi is the prior evaluation value. The wij is the (i, j) component of the adjacency matrix, and dj is the number of dimensions of the vertex j. Moreover, α □[0; 1] is a parameter. The Formula (4) means that, when the evaluation value is updated, the evaluation value of an adjacent vertex is sequentially updated with the probability α and the evaluation value is sequentially updated with the probability 1−α with reference to its own prior evaluation value.
Since random walk on the graph is one of the most fundamental graph dynamics, various analytical techniques have been established, and it is ensured that the algorithm converges (Perron-Frobenius theorem) when the graph is strongly connected. On the other hand, it is known that the accuracy is low particularly for a graph with extreme deviation in the order of vertices.
In the present embodiment, the signal processing unit 154 updates the evaluation value using Formula (5) in which graph signal processing is applied to random walk. Here, A is an adjacency matrix and D is an order matrix. Moreover, Formula (5) can be said to be obtained by rewriting the Formula (4) in a vector form.
[Math. 5]
p
(t+1)
=αAD
−1
p
(t)+(1−α)q (5)
As described above, the signal processing unit 154 updates the evaluation value of the vertex of the graph by random walk, on the basis of the matrix generated using the argument on the complex plane expressing the direction of the side of the graph in which at least some vertices are labeled.
Here, p(t):=(p1(t), . . . , pN(t))T is satisfied, and q: =(q1, . . . , qN)T is satisfied. At a fixed point where Formula (5) converges, p(t+1)=p(t) is satisfied, and thus, Formula (6) is satisfied when ˜p=D−1/2p is satisfied (here, ˜p means that ˜ is placed directly above p).
where N: =D−1/2LD−1/2=I−D−1/2AD−1/2 is a normalized Laplacian. In the undirected graph, since N=VΛVT is satisfied and (I−N)k=V(I−Λ)kVT is satisfied, p is obtained as in Formula (7).
where {circumflex over ( )}h(λ) is a filter kernel as in Formula (8).
In this manner, the signal processing unit 154 updates the evaluation value on the basis of the graph Laplacian obtained by subtracting the adjacency matrix of the graph expressed by the argument from the order matrix of the graph. The graph Laplacian here is a Hermitian Laplacian.
Specifically, the signal processing unit 154 transforms the update formula of the evaluation value by random walk when the graph is regarded as an undirected graph into the form of filtering that uses the matrix V obtained by arranging eigenvectors v of the graph Laplacian, and updates the evaluation value using a formula in which V in the transformed update formula is replaced with the unitary matrix U obtained when the graph Laplacian is spectrally decomposed. The matrix V is an example of a first matrix, and the matrix U is an example of a second matrix.
As illustrated in
Moreover, Formula (8) means that the signal processing unit 154 performs an operation of updating the evaluation value by further scaling a value obtained by filtering the prior evaluation value scaled by a power of −½ of the order using the eigenvector of the graph Laplacian by a power of ½ of the order. As described above, Sybil detection by random walk can be interpreted in terms of signal processing in the present embodiment.
The estimation unit 155 estimates a label of a vertex of the graph on the basis of the evaluation value. For example, the estimation unit 155 can estimate a vertex whose evaluation value is equal to or larger than a predetermined threshold value larger than 0 as Sybil.
Next, the detection device 10 converts the direction of the side between vertices of the graph into an argument (step S102). For example, the detection device 10 converts a side in a certain direction into an angle θ, and converts a side in a direction opposite to the certain direction into an angle −θ.
The detection device 10 generates a Hermitian matrix on the basis of the argument (step S103). For example, the detection device 10 generates a Hermitian matrix by subtracting the converted adjacency matrix from the order matrix. Here, the detection device 10 executes Sybil detection (step S104).
Next, the detection device 10 transforms the update formula into a filtering form using the matrix V in which eigenvectors are arranged (step S202). That is, the detection device 10 transforms Formula (5) into Formula (7). Furthermore, the detection device 10 replaces the matrix V of the update formula with a unitary matrix U (step S203). The unitary matrix U is obtained by spectral decomposition of the Hermitian Laplacian L.
The detection device 10 updates the evaluation value of each vertex using an update formula on the basis of the prior evaluation value (step S204). Then, the detection device 10 estimates a label of each vertex on the basis of the evaluation value (step S205).
Note that the detection device 10 may update the evaluation value using the update formula obtained in step S201.
As described above, the detection device 10 of the first embodiment has the signal processing unit 154 and the estimation unit 155. The signal processing unit 154 updates the evaluation value of the vertex of the graph by random walk, on the basis of the matrix generated using the argument on the complex plane expressing the direction of the side of the graph in which at least some vertices are labeled. The estimation unit 155 estimates a label of a vertex of the graph on the basis of the evaluation value. In this manner, the detection device 10 applies graph signal processing to random walk. Therefore, it is possible with the present embodiment to interpret the Sybil detection technique in a signal processing framework.
The signal processing unit 154 updates the evaluation value on the basis of the graph Laplacian obtained by subtracting an adjacency matrix of the graph expressed by the argument from an order matrix of the graph. As a result, graph signal processing can be easily applied to random walk.
The signal processing unit 154 transforms the update formula of the evaluation value by random walk when the graph is regarded as an undirected graph into the form of filtering that uses the matrix V obtained by arranging eigenvectors of the graph Laplacian, and updates the evaluation value using a formula in which the matrix V in the transformed update formula is replaced using the unitary matrix U obtained when the graph Laplacian spectrally decomposed. As a result, random walk by graph signal processing can be applied to a directed graph.
The signal processing unit 154 updates the evaluation value by further scaling a value obtained by filtering the prior evaluation value scaled by a power of −½ of the order using the eigenvector of the graph Laplacian by a power of ½ of the order. This allows random walk to be interpreted as graph signal processing.
This rescaling operation means that the evaluation value after smoothing is enlarged for a vertex having a larger order. That is, this means that the final evaluation value tends to concentrate on a vertex having a large order. In particular, in a graph with extreme deviation in the order of vertices, the effect of rescaling contributes more than the effect of smoothing. This is consistent with the known property that Sybil detection based on random walk will fail for graphs with extreme deviation in the order of vertices.
Moreover, each component of each illustrated device is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, a specific form of distribution and integration of each device is not limited to the illustrated form, and all or a part thereof can be functionally or physically distributed or integrated in an arbitrary unit according to various loads, usage conditions, and the like. Furthermore, all or an arbitrary part of each processing function performed in each device can be implemented by a CPU and a program analyzed and executed by the CPU, or can be implemented as hardware by wired logic.
Moreover, among the processes described in the present embodiment, all or some of the processes described as being automatically performed can be manually performed, or all or some of the processes described as being manually performed can be automatically performed by a known method. In addition, the processing procedure, the control procedure, the specific name, and the information including various data and parameters illustrated in the above document or the drawings can be arbitrarily changed unless otherwise specified.
As an embodiment, the detection device 10 can be implemented by installing a detection program for executing the detection processing mentioned above as package software or online software in a desired computer. For example, by causing the information processing device to execute the detection program mentioned above, it is possible to cause the information processing device to function as the detection device 10. The information processing device mentioned here includes a desktop or notebook personal computer. Moreover, the information processing device includes mobile communication terminals such as a smartphone, a mobile phone, and a personal handyphone system (PHS), and further includes slate terminals such as a personal digital assistant (PDA).
Moreover, the detection device 10 can also be implemented as a detection server device that uses a terminal device used by a user as a client and provides the client with a service related to the detection processing mentioned above. For example, the detection server device is implemented as a server device that accepts graph data as an input and provides a vertex that is Sybil. In this case, the detection server device may be implemented as a Web server, or may be implemented as a cloud that provides a service related to the detection processing mentioned above by outsourcing.
The memory 1010 includes a read only memory (ROM) 1011 and a RAM 1012. The ROM 1011 stores, for example, a boot program such as a basic input output system (BIOS). The hard disk drive interface 1030 is connected with a hard disk drive 1090. The disk drive interface 1040 is connected with a disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected with, for example, a mouse 1110 and a keyboard 1120. The video adapter 1060 is connected with, for example, a display 1130.
The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each processing of the detection device 10 is implemented as the program module 1093 in which codes executable by a computer are described. The program module 1093 is stored in, for example, the hard disk drive 1090. For example, the program module 1093 for executing processing similar to the functional configuration in the detection device 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced with an SSD.
Moreover, the setting data used in the processing of the above-described embodiment is stored in, for example, the memory 1010 or the hard disk drive 1090 as the program data 1094. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes the processing of the above-described embodiment.
Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (local area network (LAN), wide area network (WAN), etc.). Then, the program module 1093 and the program data 1094 may be read by the CPU 1020 from the another computer via the network interface 1070.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/035153 | 9/16/2020 | WO |