This disclosure relates to the technical field of computers, and in particular, to a data identification method and apparatus, a device and a readable storage medium.
In daily life, gambling and fraud incidents are common. In order to reduce the occurrence of such incidents, it is necessary to identify abnormal users efficiently and rapidly.
In the related art, the identification of the abnormal user is based on identification of behavior feature data of users. In a case that the behavior feature data of the user is consistent with behavior feature data of the abnormal user, the user is determined as the abnormal user. However, there may be the abnormal user that imitates the legal behavior of a normal user, so that behavior feature data corresponding to such abnormal users is closer to legal behavior feature data, which may cause the user who should be abnormal to be identified as the normal user. Therefore, the identification accuracy is not high.
Embodiments provide a data identification method and apparatus, a device, and a readable storage medium, so as to enhance the accuracy of data identification.
According to an aspect of example embodiments, a method performed by a computing device may include determining a target user set from a plurality of users, the target user set comprising at least two users having a first social relationship, wherein a first closeness of the first social relationship among the at least two users in the target user set is higher than a second closeness of a second social relationship between users in the target user set and a user not in the target user set, acquiring a default abnormal user and determining abnormal users in the target user set based on the default abnormal user, determining a status of the target user set based on the abnormal users, and identifying a diffusion-abnormal user from to-be-confirmed users based on social relationships between the abnormal users and the to-be-confirmed users in the target user set based on the status of the target user set being abnormal. The to-be-confirmed users may include users in the target user set other than the abnormal users.
According to an aspect of example embodiments, a data identification apparatus may include at least one memory configured to store computer program code and at least one processor configured to access said computer program code and operate as instructed by said computer program code, said computer program code including first determining code configured to cause the at least one processor to determine a target user set from a plurality of users, the target user set comprising at least two users having a first social relationship, wherein a first closeness of the first social relationship among the at least two users in the target user set is higher than a second closeness of a second social relationship between users in the target user set and a user not in the target user set, first acquiring code configured to cause the at least one processor to acquire a default abnormal user and determine abnormal users in the target user set based on the default abnormal user, second determining code configured to cause the at least one processor to determine a status of the target user set based on the abnormal users, and first identifying code configured to cause the at least one processor to identify a diffusion-abnormal user from to-be-confirmed users based on social relationships between the abnormal users and the to-be-confirmed users in the target user set based on the status of the target user set being abnormal. The to-be-confirmed users may include users in the target user set other than the abnormal users.
According to an aspect of example embodiments, a non-transitory computer-readable storage medium may store computer instructions that, when executed by at least one processor of a device, cause the at least one processor to determine a target user set from a plurality of users, the target user set comprising at least two users having a first social relationship, wherein a first closeness of the first social relationship among the at least two users in the target user set is higher than a second closeness of a second social relationship between users in the target user set and a user not in the target user set, acquire a default abnormal user and determine abnormal users in the target user set based on the default abnormal user, determine a status of the target user set based on the abnormal users, and identify a diffusion-abnormal user from to-be-confirmed users based on social relationships between the abnormal users and the to-be-confirmed users in the target user set based on the status of the target user set being abnormal. The to-be-confirmed users may include users in the target user set other than the abnormal users.
According to an aspect of example embodiments, a data identification apparatus is provided, including:
a target user set acquisition module, configured to acquire a target user set, the target user set including at least two users having a social relationship;
an abnormal user determination module, configured to acquire a default abnormal user, and determine abnormal users in the target user set according to the default abnormal user;
a behavior status detection module, configured to determine a status of the target user set according to the abnormal users; and
a diffusion-abnormal user identification module, configured to identify a diffusion-abnormal user from to-be-confirmed users according to social relationship between the abnormal users and the to-be-confirmed users in the target user set in a case that the status of the target user set is abnormal, the to-be-confirmed users being users in the target user set other than the abnormal users.
The abnormal user determination module includes:
an abnormal user determination unit, configured to match the users in the target user set with the default abnormal user, and determine, as the abnormal users in the target user set, users having a matching ratio reaching a matching threshold.
The behavior status detection module includes:
a total user quantity acquisition unit, configured to acquire a quantity of the abnormal users, and acquire a total quantity of the users in the target user set;
an anomaly concentration determination unit, configured to determine an anomaly concentration of the target user set according to the quantity of the abnormal users and the total quantity of the users in the target user set; and
a first status determination unit, configured to determine the status of the target user set as a normal state in a case that the anomaly concentration is less than a concentration threshold.
The first status determination unit is further configured to determine the status of the target user set as abnormal in a case that the anomaly concentration is greater than or equal to the concentration threshold.
The behavior status detection module includes:
a behavior feature acquisition unit, configured to acquire a user social behavior feature set, the user social behavior feature set including the social behavior feature of each user in a user group;
a feature distribution determination unit, configured to determine a first feature distribution of the abnormal users according to the social behavior features in the user social behavior feature set, the first feature distribution being used for representing a quantity of types of the social behavior features possessed by the abnormal users,
and further configured to determine a second feature distribution of the users in the target user set according to the social behavior features in the user social behavior feature set, the second feature distribution being used for representing a quantity of types of the social behavior features possessed by the users in the target user set;
a feature distribution difference determination unit, configured to determine a feature distribution difference between the abnormal user and the users in the target user set according to the first feature distribution and the second feature distribution; and
a second status determination unit, configured to determine the status of the target user set according to the first feature distribution and the feature distribution difference.
The second status determination unit is further configured to determine the status of the target user set as the normal state in a case that the feature distribution difference is less than a difference threshold and the first feature distribution is less than a distribution threshold.
The second status determination unit is further configured to determine the status of the target user set as the normal state in a case that the feature distribution difference is greater than or equal to the difference threshold and the first feature distribution is greater than or equal to the distribution threshold.
The second status determination unit is further configured to determine the status of the target user set as abnormal in a case that the feature distribution difference is greater than or equal to the difference threshold and the first feature distribution is less than the distribution threshold.
The target user set acquisition module includes:
a relationship topology graph acquisition unit, configured to acquire a relationship topology graph corresponding to a user group, the relationship topology graph including N nodes k, the N nodes k being in a one-to-one correspondence with users in the user group, N being a quantity of users in the user group, and an edge weight between two nodes k being determined based on a social relationship between two users in the user group;
a sampling path acquisition unit, configured to acquire sampling paths corresponding to the nodes k from the relationship topology graph according to a quantity of the sampling paths;
a jump probability determination unit, configured to determine a jump probability between the node k and an association node in the sampling path according to the edge weight in the relationship topology graph, the association node being a node in the sampling path other than the node k; and
a target user set determination unit, configured to update the relationship topology graph according to the jump probability to obtain an updated relationship topology graph, and determine the target user set in the updated relationship topology graph.
The relationship topology graph acquisition unit includes:
a user group acquisition subunit, configured to acquire a user group, each user in the user group being used as the node k;
a weight setting subunit, configured to perform edge connection between the nodes k corresponding to the users having the social relationship, and set an initial weight for an edge between the nodes k according to social behavior records among the users having the social relationship;
a probability transformation subunit, configured to perform probability transformation on the initial weight to obtain the edge weight; and
a relationship topology graph generation subunit, configured to generate the relationship topology graph according to the nodes k corresponding to the user group and the edge weight.
The jump probability determination unit includes:
an intermediate node acquisition subunit, configured to acquire an intermediate node between the node k and the association node from the sampling path in a case that there is no edge between the node k and the association node, the node k reaching the association node through the intermediate node;
a connection node pair determination subunit, configured to use, as a connection node pair, two nodes in the node k, the intermediate node, and the association node having an edge, and acquire an edge weight corresponding to the connection node pair; and
a jump probability determination subunit, configured to determine a jump probability between the node k and the association node according to the edge weight corresponding to the connection node pair.
The target user set determination unit includes:
a node edge updating subunit, configured to update a connected edge in the relationship topology graph according to the node k and the association node to obtain a transition relationship topology graph, the node k and the association node in the transition relationship topology graph being both connected with edges;
an edge weight setting subunit, configured to set the jump probability between the node k and the association node in the transition relationship topology graph as an edge weight between the node k and the association node to obtain a target relationship topology graph; and
a target user set determination subunit, configured to determine the target user set from the target relationship topology graph.
The target user set determination subunit is further configured to perform exponential growth on the jump probability, perform probability transformation on the jump probability obtained after the exponential growth to obtain a target probability, and update the edge weight between the node k and the association node according to the target probability.
The target user set determination subunit is further configured to determine, as a vital association node of the node k, the association node having the updated edge weight greater than a weight threshold.
The target user set determination subunit is further configured to divide the target relationship topology graph into at least two community topology graphs according to the node k and the vital association node, and acquire a target community topology graph from the at least two community topology graphs as the target user set.
The diffusion-abnormal user identification module includes:
a first related user determination unit, configured to determine, from the to-be-confirmed users, a user having a social relationship with the abnormal user in a case that the status of the target user set is abnormal; and
a first diffusion-abnormal user determination unit, configured to determine, as the diffusion-abnormal user, the user having the social relationship with the abnormal user.
The diffusion-abnormal user identification module includes:
a second related user determination unit, configured to determine, from the to-be-confirmed users, the user having the social relationship with the abnormal user in a case that the status of the target user set is abnormal; and
a second diffusion-abnormal user determination unit, configured to: acquire abnormal user nodes corresponding to the abnormal users, acquire association user nodes corresponding to the users having the social relationship with the abnormal users, determine, as a diffusion-abnormal node, the association user node having an edge weight with one of the abnormal user nodes greater than an association threshold, and determine the user corresponding to the diffusion-abnormal node as the diffusion-abnormal user.
The data identification apparatus further includes:
a to-be-identified user set determination module, configured to determine the target user set as abnormal as a to-be-identified user set;
a key text data extraction module, configured to acquire user text data of users in the to-be-identified user set, and extract key text data from the user text data;
a sensitive source data acquisition module, configured to acquire sensitive source data; and
an anomaly category determination module, configured to match the key text data with the sensitive source data, and determine an anomaly category of the to-be-identified user set according to a matching result.
According to an aspect of example embodiments, a computer device is provided and includes a processor and a memory.
The memory stores a computer program, the computer program, when executed by the processor, causing the processor to perform the method according to the embodiments of this application.
According to an aspect of example embodiments, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program. The computer program includes a program instruction. When the program instruction is executed by a processor, the method according to the embodiments of this application is performed.
In order to describe the technical solutions in the example embodiments of the disclosure more clearly, the following briefly describes the accompanying drawings for describing the example embodiments. Apparently, the accompanying drawings in the following description merely show some embodiments of the disclosure, and a person of ordinary skill in the art may still derive other accompanying drawings from these accompanying drawings without creative efforts.
The technical solutions in embodiments of this disclosure are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of this disclosure. Apparently, the described embodiments are merely some rather than all of the embodiments of this disclosure. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of this disclosure without creative efforts shall fall within the protection scope of this disclosure.
As shown in
The method of the embodiments may be performed by one or more computing devices, such as one or more computing devices in the business server 1000 shown in
In the embodiments of this disclosure, one of the plurality of user terminals may be selected as a target user terminal. The target user terminal may include intelligent terminals having functions of displaying and playing data information, such as a smart phone, a tablet computer, a desktop computer, and the like. For example, in the embodiments of this disclosure, the user terminal corresponding to the back-end server 100a shown in
By using an example of determining the diffusion-abnormal user from the community topology graph, the business server 1000 may adopt the following implementations for determining the diffusion-abnormal user. The business server 1000 may select one community topology graph from the divided community topology graphs as the target user set. The target user set includes at least two users having a social relationship. The business server 1000 may acquire a default abnormal user (that is, the existing abnormal user sample). According to the default abnormal user, the business server 1000 may determine the abnormal users in the target user set. The business server 1000 may detect the status of the target user set according to the quantity of the abnormal users and the total quantity of the users in the target user set. When the target user set is in the abnormal state, the business server 1000 may identify the diffusion-abnormal user from to-be-confirmed users according to the social relationships between the abnormal users and the to-be-confirmed users in the target user set, and use the diffusion-abnormal user as the abnormal user. The to-be-confirmed users are users in the target user set other than the abnormal users. After the abnormal user (including the diffusion-abnormal user) in each relationship topology graph is determined, the business server 1000 may generate an identification result according to the abnormal user in each relationship topology graph, and return the identification result to the back-end server.
In some embodiments, the back-end server may determine the large quantity of the users corresponding to the respective user terminal as the user group. Different community topology graphs are divided according to the user group to obtain different user sets. The abnormal users and the diffusion-abnormal users are identified in the user sets. For the implementation herein that the back-end server identifies the abnormal users and the diffusion-abnormal users, reference may be made to the description that the business server identifies the abnormal users and the diffusion-abnormal users.
The method provided in the embodiments of this disclosure may be performed by a computer device. The computer device includes, but is not limited to, a terminal or a server.
In operation S101, the system acquires a target user set, the target user set including at least two users having a social relationship.
In this operation, the target user set may be determined from a plurality of users. The plurality of users may be the plurality of users screened according to a preset condition, or the plurality of users corresponding to a back-end server, or all users (also referred to as a user group) of a social application. The determined target user set satisfies the condition of a closeness of social relationships among the users in the target user set being higher than a closeness of a social relationship between the users in the target user set and a user not in the target user set. The closeness of the social relationships among the users may be determined according to social behavior records of the users. For example, the social behavior records may include, but are not limited to, a frequency of information interaction among the users, information interaction times, information interaction durations, an information amount of interaction, a transaction amount, and the like.
In the embodiment of this disclosure, the target user set may be a community topology graph. The community topology graph includes nodes corresponding to the users, edges between the nodes, and an edge weight of each edge. The edge between the nodes is used for representing social relationships among the nodes (users). The edge weight is used for representing an association degree. If there is a social relationship between two users, there is an edge between the nodes corresponding to the two users. A closer relationship between the two users leads to a larger association degree and a larger edge weight. The community topology graph may be used for indicating whether there is a social relationship between the nodes, and indicating the association degree between the two nodes having the social relationship. The social relationship herein may be a payment relationship, a communication friend relationship, a device relationship, and the like. For example, in a case that the user a uses a communication device (such as a smart phone) of the user b to log in to an account, it may be determined that the user a has a device relationship with the user b. In addition to the payment relationship, the communication friend relationship, and the device association, the social relationship may further include relationships of other forms (for example, social accounts of the two users do not have a friend relationship, but the two users have had a conversation by using the social accounts). The range of the social relationship is not limited in this disclosure.
The target user set may be obtained from the relationship topology graph corresponding to the user group. Nodes in the target user set are some nodes in the relationship topology graph of the user group. According to the edge weights (that is, the association degrees among the users) among the nodes in the relationship topology graph, the relationship topology graph may be divided into at least two community topology graphs. Any of the at least two community topology graphs is selected as the target user set. The user group may be divided into at least two communities according to the social relationships and the association degrees among the users in the user group. The users in each community are closely related.
In operation S102, the system acquires a default abnormal user, and determining abnormal users in the target user set according to the default abnormal user.
In this embodiment, the default abnormal user may be a preset abnormal user sample. The abnormal user sample may be an abnormal user that is detected in advance. There may be at least two default abnormal users. The default abnormal users may include attribute information (such as IDs, names, fingerprints and the like) of the users. The attribute information is the ID by way of example. The ID of each user in the target user set may be matched with an ID of one of the default abnormal users. The users having a matching ratio reaching a matching threshold in the target user set may be determined as the abnormal users in the target user set.
The default abnormal users include <a default abnormal user 1, 1> and <a default abnormal user 2, 2>. The default abnormal users include the default abnormal user 1, and the ID of the default abnormal user 1 is 1. The default abnormal users further include the default abnormal user 2, and the ID of the default abnormal user 2 is 2. The target user set includes {<a user A, 1>, <a user B, 4>, and <a user C, 6>}. Then, the ID (that is, 1 and 2) of the default abnormal user 1 may be matched with the ID (that is, 1, 4, and 6) of the users in the target user set, so that matching result that the ID1 of the user A matches the ID1 of the default abnormal user 1 may be obtained. In this way, the user A may be determined as the abnormal user in the target user set.
In operation S103, the system determines a status of the target user set according to the abnormal user.
A status of the target user set may be determined according to a quantity of the abnormal users and a total quantity of the users in the target user set. An anomaly concentration of the target user set may be determined according to the quantity of the abnormal users and the total quantity of the users in the target user set. The anomaly concentration is a ratio of the quantity of the abnormal users in the target user set to the total quantity of the users. In a case that the anomaly concentration is less than a concentration threshold, it may indicate that the proportion of the abnormal users in the target user set is low, so that the status of the target user set may be determined as a normal state. In a case that the anomaly concentration is greater than the concentration threshold, it may indicate that the proportion of the abnormal users in the target user set is high, so that the status of the target user set may be determined as the abnormal state. A method for determining the anomaly concentration of the target user set may be shown in Equation (1):
C=N/M (1)
where C may be used for representing the anomaly concentration of the target user set, N may be used for representing the quantity of the abnormal users in the target user set, and M may be used for representing the total quantity of the users in the target user set.
In some embodiments, the status of the target user set may be determined by using a user social behavior feature set, for example, by acquiring the user social behavior feature set. The user social behavior feature set herein includes a social behavior feature of each user in the user group. The user social behavior feature set may include historical data of the social behavior feature of each user in the detected user group. For example, in a case that the user A has been to the Central Park and the Flower Town, two social behavior features of the user A having been to the Central Park and the Flower Town may be stored in the user social behavior feature set. It may be understood that, the user social behavior feature set may include communication devices used by the users, wireless networks, user behaviors (for example, frequently going to a same place), and the like. A type and a quantity of the social behavior features of the abnormal users in the target user set may be counted according to the user social behavior feature set. Information entropy may be determined according to the distribution of social behavior features of the abnormal users. A smaller information entropy may indicate a more concentrated distribution of the abnormal users on the social behavior features. For example, a method for determining the information entropy may be shown in Equation (2):
H(x)=−Σi=1nP(xi)log P(xi) (2)
where H(x) may be used for representing the information entropy, and P(xi) may be used for representing the distribution of social behavior features of the users.
For example, the social behavior feature set includes three social behavior features: a wireless network, a user behavior, and a communication device, and i in Equation (2) may be 1, 2, and 3. In this way, the social behavior feature of the wireless network may be represented by x1, x2, and x3. The social behavior feature of the user behavior may be represented by x1, x2 and x3. The social behavior feature of the communication device may be represented by x1, x2, and x3. The wireless network being represented by x1, the user behavior being represented by x2, and the communication device being represented by x3 are used as an example. For the social behavior feature of the wireless network, a quantity of the abnormal users is 50. In the 50 abnormal users, 48 abnormal users use the same wireless network A, and 2 abnormal users use other different wireless networks B. Therefore, a quantity of the wireless networks as the social behavior feature is 3 (one wireless network A+one wireless network B+one wireless network C). Since 48 abnormal users in the 50 abnormal users use the same wireless network A, a small quantity of the wireless networks with small differences may indicate that the abnormal users are concentrated in distribution on the social behavior feature of the wireless network, so that a distribution P (the wireless network) of the abnormal users on the social behavior feature of the wireless network can be obtained (that is, a value of P(x1) is P (the wireless network)). For the social behavior feature of the user behavior, 30 abnormal users go to the same coffee shop more than 10 times on a same day, and 20 abnormal users go to 20 different other places on a same day. Then, the quantity of the abnormal users distributed on the social behavior feature of user behavior is 21 (that is, one coffee shop+20 other places). Since 30 abnormal users in the 50 abnormal users go to the same coffee shop on the same day, it may indicate that the distribution of the abnormal users is relatively concentrated on the social behavior feature of the user behavior, so that the distribution P (the user behavior) of the abnormal users on the social behavior feature of the user behavior can be obtained (that is, a value of P(x2) is P (the user behavior)). For the social behavior feature of the communication device, 10 abnormal users use a same communication device A to log in to the accounts, 5 abnormal users use a same communication device B to log in to the accounts, and 35 abnormal users use 35 different other communication devices to log in to the accounts. Then, the quantity of the abnormal users distributed on the social behavior feature of the communication device is 37 (that is, one communication device A+one communication device B+35 other communication devices). Since 35 abnormal users in the 50 abnormal users use different communication devices, a larger quantity of the communication devices with large differences may indicate that the distribution of the abnormal users on the social behavior feature of the communication device is disperse (that is, a concentration is low). In this way, the distribution P (the communication device) of the abnormal users on the social behavior feature of the communication device can be obtained (that is, a value of P(x3) is P (the communication device)). According to the distribution P (the wireless network) of the abnormal users on the social behavior feature of the wireless network, the distribution P (the user behavior) of the abnormal users on the social behavior feature of the user behavior, the distribution P (the communication device) of the abnormal users on the social behavior feature of the communication device, and Equation (2), a first feature distribution H(x) of the abnormal users can be obtained. The first feature distribution H(x) herein is a total distribution value of the abnormal users on the three social behavior features of the wireless network, the user behavior, and the communication device.
Similarly, a second feature distribution of the users (including the abnormal users) in the target user set may be determined according to the social behavior features in the user social behavior feature set, that is, a feature distribution of the entire target user set. For the implementation of determining the second feature distribution, for example, reference may be made to the above description for determining the first feature distribution. According to the first feature distribution and the second feature distribution, a feature distribution difference (a difference between the first feature distribution and the second feature distribution) between the abnormal users and the users in the target user set may be determined. In a case that the feature distribution difference is less than a difference threshold, and the first feature distribution is less than a distribution threshold, it may indicate that the social behavior feature distribution of the abnormal users is concentrated, the distribution difference between the abnormal users and the entire target user set is small, which may indicate that the social behavior features of the abnormal users in the target user set are normal and popular. Therefore, the target user set is in the normal state. In a case that the feature distribution difference is greater than or equal to the difference threshold, and the first feature distribution is greater than or equal to the distribution threshold, it may indicate that the social behavior feature distribution of the abnormal users is disperse, and the distribution difference between the abnormal users and the entire target user set is large. In this way, it may indicate that the social behavior features among the abnormal users are inconsistent, and the social behavior features between the abnormal users and normal users are also inconsistent, which may indicate that the social behavior features of the abnormal users in the target user set are minority. Therefore, the target user set is in the normal state. If the feature distribution difference is greater than or equal to the difference threshold, and the first feature distribution is less than the distribution threshold, it may indicate that the social behavior feature distribution of the abnormal users is concentrated. In this way, the social behavior features among the abnormal users are relatively consistent, and a social behavior feature difference between the abnormal users and the normal users in the target user set is very large. Therefore, the target user set is in the abnormal state. For example, a method for determining the feature distribution difference may be shown in Equation (3):
where DKL(P∥Q) may be used for representing the feature distribution difference, P(i) may be used for representing the first feature distribution (that is, the distribution of the social behavior features of the abnormal users), and Q(i) may be used for representing the second feature distribution (that is, the distribution of the overall social behavior features of the users in the target user set).
In some embodiments, the status of the target user set may be determined by using the anomaly concentration of the target user set, or may be determined by using the user social behavior features, and may further be determined by combining the anomaly concentration and the user social behavior features. The anomaly concentration is first determined. After the anomaly concentration is greater than the concentration threshold, the user social behavior features are determined. The status of the target user set is determined as the abnormal state in a case that the conditions that the anomaly concentration is greater than the concentration threshold, the first feature distribution is less than the distribution threshold, and the feature distribution difference is greater than or equal to the difference threshold are simultaneously satisfied.
In operation S104, the system identifies a diffusion-abnormal user from to-be-confirmed users according to social relationships between the abnormal users and the to-be-confirmed users in the target user set in a case that the status of the target user set is an abnormal state, the to-be-confirmed users being users in the target user set other than the abnormal users.
In some embodiments, in a case that the status of the target user set is the abnormal state, users having social relationships with the abnormal users may be determined from the to-be-confirmed users and are determined as the diffusion-abnormal user. The having the social relationship herein may be, in the community topology graph in which the node corresponding to the abnormal user is located, edges starting from the abnormal users that exist between the nodes corresponding to the abnormal users and the nodes corresponding to the to-be-confirmed users.
Referring to
In some embodiments, in a case that the status of the target user set is the abnormal state, the user having the social relationship with the abnormal user is determined from the to-be-confirmed users. Abnormal user nodes corresponding to the abnormal users are acquired. Association user nodes corresponding to the users having the social relationship with the abnormal users are acquired. The association user nodes having an edge weight with one of the abnormal user nodes greater than an association threshold are determined as a diffusion-abnormal node. In this way, the user corresponding to the diffusion-abnormal node is determined as the diffusion-abnormal user.
Referring to
It can be learned from the above that, in the dividing the users having the social relationships into the target user set, in a case that the abnormal users in the target user set are determined and the target user set is in the abnormal state, the users having the social relationship with the abnormal user may be acquired from the target user set and are directly used as the diffusion-abnormal user without performing feature matching on each user. The identification of the diffusion-abnormal user can be performed by using the social relationship. Therefore, even if the diffusion-abnormal users have the same feature as the normal users, the diffusion-abnormal user may still be identified due to the social relationship with the abnormal user. In this way, the accuracy of identification can be enhanced.
In the various embodiments, in a case that the target user set is determined from the plurality of users, the plurality of users may be divided into at least two user sets according to collected social relationships and social behaviors among the plurality of users, so that a closeness of a social relationship among users in each user set is higher than a closeness of a social relationship among users in a different user set. Each of the plurality of user sets is used as the target user set.
In some embodiments, in a case that the plurality of users are divided into the plurality of user sets, a relationship topology graph may be determined according to the social relationships and social behaviors among the plurality of users. In the relationship topology graph, each node corresponds to one of the plurality of users. An edge connecting two nodes indicates that there is a social relationship between the users corresponding to the two nodes. A closeness of the social relationship between the two users is determined according to the social relationships and the social behaviors among the plurality of users. A weight of the edge between the nodes corresponding to the two users is determined according to the closeness. The relationship topology graph is divided into at least two topology sub-graphs by using a clustering algorithm. A set of the users corresponding to the nodes in one of the at least two topology sub-graphs is used as the target user set.
In operation S201, the system acquires a relationship topology graph corresponding to a user group. The relationship topology graph includes N nodes k. The N nodes k are in a one-to-one correspondence with the users in the user group. N is a quantity of the users in the user group, and k refers to a general index that is specified per node (e.g., a user A may correspond to a node A, where ‘A’ in this instance is the specific index to which ‘k’ generally referred). An edge weight between two nodes k is determined based on a social relationship between two users in the user group.
In some embodiments, N may be the quantity of the users in the user group. Each user in the user group may serve as the node k after the user group is acquired. For example, the user A serve as the node A, and the user B serve as the node B. According to the social relationship between the two users in the user group, the edge weight between the two nodes k in the relationship topology graph may be determined. One user group has N users, and each user may correspond to one node k. In a case that there is a social relationship between the two users, an edge connection between the two nodes k corresponding to the two users may be performed. According to social behavior records between the users having the social relationship, an initial weight may be set for the edge between the nodes k. Probability transformation is performed on the initial weight. A result after the probability transformation is used as the weight of the edge between the nodes k. In this way, the relationship topology graph corresponding to the user group may be generated according to the node k corresponding to the user group and the edge weight. The social behavior records herein may be a transfer amount, a transfer frequency, a communication frequency, and a communication duration between the users having the social relationship. A larger transfer amount, a higher transfer frequency, a higher communication frequency, or a longer communication duration between the two users leads to a larger initial weight set for the edge between the two users. The probability transformation herein may be standardization on the initial weight of each edge. For example, for the node i and the node j, an edge exists between the node i and the node j, and the edge between the node i and the node j may be expressed as Mij. Then the probability transformation of Mij may be shown in Equation (4):
where, Wij represents the initial weight between the node i and the node j, and Σi=1nWij represents a sum of the initial weights between the n nodes and the node j.
According to the node relationship list shown in
The adjacency matrix A1 is the matrix of 4×4. A value 1 in the adjacency matrix A1 may be used for indicating that there is a social relationship (that is, an edge is connected between the nodes) between the two users, and a value 0 may be used for indicating that there is no social relationship (that is, no edge is connected between the nodes) between the two users. For example, there is a social relationship between the user A and the user B, and an edge connection between the user A and the user B is required, so that the edge weight data M12 jointly corresponding to the node A and the node B is set to 1. There is no social relationship between the user D and the user A, and therefore it is not necessary to perform edge connection on the node D and the node A. Then the edge weight data M41 jointly corresponding to the node D and the node A is set to 0. Herein, a loop is added to each node. An edge is added to each node. The edge weight data M11, the edge weight data M22, the edge weight data M33, and the edge weight data M44 are all set to 1.
Further, according to the social behavior records among the user A, the user B, the user C, and the user D, the initial weight can be set for each edge. For the user A and the user B, the user A transferred money to the user B twice, and the transfer amount in total reaches 100 thousand, so that the initial weight of the edge between the node A and the node B may be set to 10. For the user A and the user C, there is no social behavior records (that is, there is no transfer behavior or call behavior between the user A and the user C) between the user A and the user C, so that the initial weight of the edge between the node A and the node B may be set to 1. For the user B and the user C, the user B frequently communicates with the user C, and each call lasts more than 20 minutes, so that the initial weight of the edge between the node B and the node C may be set to 8. For the user B and the user D, the user B frequently transfers money to the user D, so that the initial weight of the edge between the node B and the node D may be set to 9.
The adjacency matrix A2 is the matrix of 4×4.
Probability transformation (that is, standardization) may be performed on elements (that is, the initial weights) in the adjacency matrix A2. For example, a method for probability transformation may be as follows. By using an element M12 (that is, the initial weight of the edge between the node A and the node B) as an example, the initial weight of the edge from the node A to the node B (that is, the element M12) may be 10, then the initial weight of the edge from the node A to the node C is 1, the initial weight of the edge from the node C to the node B is 8, and the initial weight of the edge from the node D to the node B is 9. The element M12, an element M22, an element M32, and an element M42 in the column where the element M12 is located in the adjacency matrix A2 are acquired. By adding up values of the element M12, the element M22, the element M32, and the element M42, an addition result of 28 may be obtained. According to the value 10 of the element M12 and the addition result of 28, a result of 10/28=0.36 after the probability transformation on the element M12 may be obtained, and then 0.36 may be used as the edge weight from the node A to the node B. Similarly, the edge weights of other edges may be obtained. According to the adjacency matrix A2 and the edge weights after the probability transformation is performed on each element, a probability matrix A3 for representing the relationships among the node A, the node B, the node C, and the node D and the degree of association may be obtained. The probability matrix A3 is shown in the following matrix:
The probability matrix A3 is the matrix of 4×4.
The probability transformation is not required to be performed on the edge weights (that is, the element M11, the element M22, the element M33, and the element M44) between each node and the respective nodes.
In operation S202, the system acquires sampling paths corresponding to the nodes k from the relationship topology graph according to a quantity of sampling paths.
In some embodiments, for each node in the relationship topology graph, a jump probability that each node reaches other nodes in the relationship topology graph may be calculated by walking, so as to obtain a community of each node. For example, the calculation method may be shown in Equation (5):
Expa(Mij)=Σk=1:nMik*Mkj (5)
where (Mij) may be used for representing the jump probability from the node i to the node j, Mik may be used for representing the probability (the edge weight) from the node i to the node k, and Mkj may be used for representing the probability (the edge weight) from the node k to the node j.
For example, there is no edge connection between the node A and the node D, but there is an edge connection between the node A and the node B, an edge connection between the node B and the node C, and an edge connection between the node C and the node D, which may indicate that the node A may walk 3 steps to reach the node D (that is, the node A-the node B-the node C-the node D). The edge weight from the node A to the node B is 0.2, the edge weight from the node B to the node C is 0.3, and the edge weight from the node C to the node D is 0.4. Then, the jump probability of 0.2×0.3×0.4=0.024 from the node A to the node D may be obtained according to Equation (5).
Since there is a large quantity of the users in the user group, that is, there is a large quantity of nodes, in a case that the jump probability from each node to other nodes in the relationship topology graph is calculated, the scale is huge, which may cause a waste of time and space. In order to save time and space, in this solution, a Monte-Carlo (MCL) sampling walking method is used for calculation, that is, a path of each node is sampled, thereby calculating the jump probability from each node to other nodes in the sampling path of the node. In this solution, the probability from each node to all of other nodes does not need to be calculated. It is only necessary to sample the path of each node according to the quantity of the sampling paths, to acquire the sampling path of each node. An association node in the sampling path may be acquired according to a jump threshold. Then, the jump probability from each node to the association node in the sampling path is calculated. Since only the jump probability from each node to some nodes in the relationship topology graph is calculated, the jump probability from each node to all of the nodes in the relationship topology graph does not need to be calculated. In this way, a large amount of calculation can be reduced, and time consumption and space consumption can be reduced. The quantity of the sampling paths and the jump time of each node may be controlled manually, and a result obtained after the sampling may also be controlled within an error range. In addition, due to the sampling of data, in a case that the user group, that is, a data scale, is huge, the MCL sampling walking method may also rapidly complete the calculation and obtain high-accuracy results.
In some embodiments, the quantity of the sampling paths is a non-zero positive integer. The quantity of the sampling paths may be a value specified by people, or may be a value randomly generated by a server within an allowable range of values. According to the quantity of the sampling paths, the sampling path corresponding to each node k may be acquired from the relationship topology graph corresponding to the user group. The sampling path refers to extraction of some paths corresponding to the quantity of the sampling paths from the paths using the node k as an initial node. According to the jump threshold, the association node of each node k may be determined from the sampling path of each node k. The association node is the node in the sampling path other than the node k. For example, the association node may be the node that is reachable by jumping within the jump threshold (including the jump threshold) by starting from the node k. For example, the relationship topology graph in the embodiment corresponding to
In operation S203, the system determines a jump probability between the node k and an association node in the sampling path according to the edge weight in the relationship topology graph, the association node being a node in the sampling path other than the node k.
In some embodiments, the jump probability of the node k and the association node may be determined according to the edge weight in the relationship topology graph corresponding to the user group. For example, in a case that there is no edge between the node k and the association node, in the sampling path of the node k, an intermediate node between the node k and the association node of the node k may be acquired. The node k may reach the association node through the intermediate node. In the node k, the intermediate node, and the association node having the edge, the two nodes may be used as a connection node pair. According to the edge weight corresponding to the connection node pair, the jump probability between the node k and the association node may be determined.
Referring to
In operation S204, the system updates the relationship topology graph according to the jump probability to obtain an updated relationship topology graph, and determine the target user set from the updated relationship topology graph.
In some embodiments, the relationship topology graph may be updated according to the jump probability. Edges connected in the relationship topology graph may be updated according to the node k and the association node. An edge connection (adding new edges to the relationship topology graph) is performed on each node k and the association nodes having no edges with the node, so as to obtain a transition relationship topology graph. For example, by using an embodiment corresponding to
By using the embodiment corresponding to
The probability matrix A4 is the matrix of 4×4. An element 0 in the probability matrix A4 indicates that the nodes are unreachable. For example, an element M13 (that is, the edge weight from the node A to the node C) is used as an example. Although in the probability matrix A3, the probability from the node A to the node C is 0.1 (the node A can reach the node C, and there is an edge between the node A and the node C), the extracted path of the node A is A-B-D, other unextracted paths of the node A are not taken into account. It is only necessary to consider the paths from the node A to the node B and from the node A to the node D (that is, an element M12 and an element M14 in the probability matrix A4).
Further, in the target relationship topology graph, convex transformation may be performed on the edge weight (the jump probability) in the target relationship topology graph. That is to say, exponential growth is performed on the edge weight, and probability transformation (that is, standardization) is performed on the jump probability obtained after the exponential growth. After the convex transformation, a target probability may be obtained. The edge weight between the node k and the association node of the node k may be updated according to the target probability. In these updated edge weights, in a case that there is the association node greater than the weight threshold, the association node having an updated edge weight greater than or equal to the weight threshold may be determined as a vital association node of the node k. The target relationship topology graph may be divided into at least two community topology graphs according to the node k and the vital association node of the node k. A target community topology graph is acquired from the at least two community topology graphs as the target user set.
The exponential growth is performed on the jump probability. The probability transformation (standardization) is performed on the jump probability obtained after the exponential growth. That is, convex transformation is performed on the jump probability. The method for obtaining the target probability, for example, may be shown in Equation (6):
where Γr(Mij) is used for representing the target probability from the node i to the node j, Mij is used for representing the edge weight from the node i to the node j, (Mij)r is used for representing that the exponential growth is performed on the edge weight from the node i to the node j for r times, and Σi=1n(Mij)r represents a sum of weights of the edge weight from n nodes to the node j after the exponential growth for r times.
The probability matrix A4 and r being 3 are used as an example. For the target probability (that is, Γr (M21) from the node B to the node A, the exponential growth may be first performed on M21 for 3 times, that is, 0.83×0.83×0.83=0.572. The sum after the exponential growth is performed on the element M11, the element M21, the element M31, and the element M41 respectively for 3 times is 03+0.833+0.083+0.266=0.591, and then Γr(M21) may be 0.572/0.591=0.968. For the target probability (that is, Γr(M41)) from the node D to the node A, the exponential growth may be first performed on M41 for 3 times, that is, 0.266×0.266×0.266=0.019. The sum after the exponential growth is performed on the element M11, the element M21, the element M31, and the element M41 respectively for 3 times is 03+0.833+0.083+0.266=0.591, and then Γr(M41) may be 0.019/0.591=0.032. In a case that the element M21 is 0.83, a value after the exponential growth and standardization is 0.968. In a case that the element M41 is 0.266, a value after the exponential growth and standardization is 0.032. Therefore, it can be determined that, by means of the exponential growth and standardization of the elements, the value having a large element (the edge weight) may become larger (for example, 0.83 is changed to 0.968), and the value having a small element (the edge weight) may become smaller (for example, 0.266 is changed to 0.032). That is to say, in this solution, by means of the MCL sampling walking method and the convex transformation, the degree of association between the users may become closer, or the degree of association between the users may become weaker, which facilitates the division of communities, so that the dividing result is more accurate.
In some embodiments, before the community topology graph is divided, a quantity of iterations may be set, so that steps from acquisition of the sampling paths to calculation of the target probability may be repeated for a plurality of times. That is to say, random sampling is performed on each node k for the first time, and then the target probability is used as the edge weight between the nodes after the target probability between the nodes is calculated. Then, random sampling is performed for the second time, and the target probability between the nodes is calculated. In the second sampling path, the target probability is used as the edge weight to calculate a new target probability between the nodes. In this way, the steps are repeated until the quantity of iterations are reached, so that the final target probability may be determined as a stable probability, and then the community topology graph is divided by using the stable target probability.
It can be learned from the above that, in the dividing the users having the social relationships into the target user set, in a case that the abnormal users in the target user set are determined and the target user set is in the abnormal state, the users having the social relationship with the abnormal user may be acquired from the target user set and are directly used as the diffusion-abnormal user without performing feature matching on each user. The identification of the diffusion-abnormal user can be performed by using the social relationship. Therefore, even if the diffusion-abnormal user has the same features as the normal user, the diffusion-abnormal user can still be identified because the diffusion-abnormal user has the social relationship with the abnormal user, thereby improving the accuracy of identification.
In Table 1, the column data represents the initial nodes, and the row data represents arrival nodes. The node a is used as an example. The jump probability from the node a to the node b is 0.35, the jump probability from the node a to the node i is 0.7, and the jump probability from the node a to the node k is 0.28. It can be determined from Table 1 that, the edge weights greater than or equal to the weight threshold of 0.5 include as follows. The jump probability from the node a to the node i is 0.7, the jump probability from the node b to the node i is 0.5, the jump probability from the node c to the node d is 0.56, the jump probability from the node c to the node e is 0.7, the jump probability from the node d to the node c is 0.56, the jump probability from the node d to the node e is 0.8, the jump probability from the node e to the node d is 0.8, the jump probability from the node e to the node g is 0.6, the jump probability from the node g to the node k is 0.5, the jump probability from the node i to the node a is 0.7, the jump probability from the node j to the node a is 0.7, and the jump probability from the node j to the node i is 0.8. Then, the business server 1000 may use the jump probability as the edge weight of each edge to obtain a target relationship topology graph 20b (after sampling). The node having the edge weight greater than the weight threshold may be divided into one community. The business server 1000 may divide the node c, the node e, the node d, the node g, and the node k into one community, and divide the node i, the node j, the node a, and the node b into one community. Therefore, a community topology graph 200a (that is, the community) and a community topology graph 200b (that is, the community) may be obtained from the target relationship topology graph 20b (after sampling). As shown in
In operation S301, the system determines the target user set in the abnormal state as a to-be-identified user set.
In operation S302, the system acquires user text data of users in the to-be-identified user set, and extracts key text data from the user text data.
In some embodiments, the user text data may be note information of a user during a transfer, conversation information of the user during a call, and the like. Keyword identification may be performed on the user text data to extract the key text data. For example, the note information of the user during the transfer is “gambling debt repayment”, so that a keyword “gambling debt” may be extracted.
In operation S303, the system acquires sensitive source data.
In some embodiments, the sensitive source data is a preset anomaly category set. The sensitive source data may include anomaly categories such as gambling, cashing, fraud, robbery, theft, and the like.
In operation S304, the system matches the key text data with the sensitive source data, and determines an anomaly category of the to-be-identified user set according to a matching result.
It can be learned from the above that, in the dividing the users having the social relationships into the target user set, in a case that the abnormal users in the target user set are determined and the target user set is in the abnormal state, the users having the social relationship with the abnormal user may be acquired from the target user set and are directly used as the diffusion-abnormal user without performing feature matching on each user. The identification of the diffusion-abnormal user can be performed by using the social relationship. Therefore, even if the diffusion-abnormal user has the same features as the normal user, the diffusion-abnormal user can still be identified because the diffusion-abnormal user has the social relationship with the abnormal user, thereby improving the accuracy of identification.
In some embodiments, the key text data may be matched with the sensitive source data. For example, the key text data is “gambling debt”, and after the key text data is matched with the sensitive source data, a matching ratio of “gambling debt” to “gambling” may reach 90%. In this way, the anomaly category of the to-be-identified user set may be determined as “gambling”.
The target user set acquisition module 11 is configured to acquire a target user set. The target user set includes at least two users having a social relationship.
The abnormal user determination module 12 is configured to acquire a default abnormal user, and determine abnormal users in the target user set according to the default abnormal user.
The behavior status detection module 13 is configured to determine a status of the target user set according to the abnormal user.
The diffusion-abnormal user identification module 14 is configured to identify a diffusion-abnormal user from to-be-confirmed users according to social relationships between the abnormal users and the to-be-confirmed users in the target user set in a case that the status of the target user set is an abnormal state. The to-be-confirmed users are users in the target user set other than the abnormal users.
For the implementations of the target user set acquisition module 11, the abnormal user determination module 12, the behavior status detection module 13, and the diffusion-abnormal user identification module 14, for example, reference may be made to the descriptions of operation S101 to operation S104 in the embodiment corresponding to
Referring to
The abnormal user determination unit 121 is configured to match the users in the target user set with the default abnormal user, and determine, as the abnormal users in the target user set, the users having a matching ratio in the target user set reaching a matching threshold.
For the implementation of the abnormal user determination unit 121, for example, reference may be made to the description of operation S102 in the embodiment corresponding to
Referring to
The total user quantity acquisition unit 131 is configured to acquire a quantity of the abnormal users, and acquire a total quantity of the users in the target user set.
The anomaly concentration determination unit 132 is configured to determine an anomaly concentration of the target user set according to the quantity of the abnormal users and the total quantity of the users in the target user set.
The first status determination unit 133 is configured to determine the status of the target user set as a normal state in a case that the anomaly concentration is less than a concentration threshold.
The first status determination unit 133 is further configured to determine the status of the target user set as an abnormal state in a case that the anomaly concentration is greater than or equal to the concentration threshold.
For the implementations of the total user quantity acquisition unit 131, the anomaly concentration determination unit 132, and the first status determination unit 133, for example, reference may be made to the description of operation S103 in the embodiment corresponding to
Referring to
The behavior feature acquisition unit 134 is configured to acquire a user social behavior feature set. The user social behavior feature set includes a social behavior feature of each user in a user group.
The feature distribution determination unit 135 is configured to determine a first feature distribution of the abnormal users according to the social behavior features in the user social behavior feature set. The first feature distribution is used for representing a quantity of types of the social behavior features possessed by the abnormal users.
The feature distribution determination unit 135 is further configured to determine second feature distributions of the users in the target user set according to the social behavior features in the user social behavior feature set. The second feature distribution is used for representing a quantity of types of the social behavior features possessed by the users in the target user set.
The feature distribution difference determination unit 136 is configured to determine a feature distribution difference between the abnormal user and the users in the target user set according to the first feature distribution and the second feature distribution.
The second status determination unit 137 is configured to determine the status of the target user set according to the first feature distribution and the feature distribution difference.
The second status determination unit 137 is further configured to determine the status of the target user set as the normal state in a case that the feature distribution difference is less than a difference threshold and the first feature distribution is less than a distribution threshold.
The second status determination unit 137 is further configured to determine the status of the target user set as the normal state in a case that the feature distribution difference is greater than or equal to the difference threshold and the first feature distribution is greater than or equal to the distribution threshold.
The second status determination unit 137 is further configured to determine the status of the target user set as the abnormal state in a case that the feature distribution difference is greater than or equal to the difference threshold and the first feature distribution is less than the distribution threshold.
For the implementations of the behavior feature acquisition unit 134, the feature distribution determination unit 135, the feature distribution difference determination unit 136, and the second status determination unit 137, for example, reference may be made to the description of operation S103 in the embodiment corresponding to
Referring to
The relationship topology graph acquisition unit 111 is configured to acquire a relationship topology graph corresponding to a user group. The relationship topology graph includes N nodes k. The N nodes k are in a one-to-one correspondence with users in the user group. N is a quantity of the users in the user group. An edge weight between two nodes k is determined based on a social relationship between two users in the user group.
The sampling path acquisition unit 112 is configured to acquire sampling paths corresponding to the nodes k from the relationship topology graph according to a quantity of sampling paths.
The jump probability determination unit 113 is configured to determine a jump probability between the node k and an association node in the sampling path according to the edge weight in the relationship topology graph. The association nodes are nodes in the sampling path other than the node k.
The target user set determination unit 114 is configured to update the relationship topology graph according to the jump probability to obtain an updated relationship topology graph, and determine the target user set from the updated relationship topology graph.
For the implementations of the relationship topology graph acquisition unit 111, the sampling path acquisition unit 112, the jump probability determination unit 113, and the target user set determination unit 114, for example, reference may be made to the description of operation S101 in the embodiment corresponding to
Referring to
The user group acquisition subunit 1111 is configured to acquire a user group. Each user in the user group is used as the node k.
The weight setting subunit 1112 is configured to perform an edge connection between the nodes k corresponding to the users having the social relationship, and set an initial weight for an edge between the nodes k according to social behavior records among the users having the social relationship.
The probability transformation subunit 1113 is configured to perform probability transformation on the initial weight to obtain the edge weight.
The relationship topology graph generation subunit 1114 is configured to generate the relationship topology graph according to the nodes k corresponding to the user group and the edge weight.
For the implementations of the user group acquisition subunit 1111, the weight setting subunit 1112, the probability transformation subunit 1113, and the relationship topology graph generation subunit 1114, for example, reference may be made to the description of operation S101 in the embodiment corresponding to
Referring to
The intermediate node acquisition subunit 1131 is configured to acquire an intermediate node between the node k and the association node from the sampling path in a case that there is no edge between the node k and the association node. The node k reaches the association node through the intermediate node.
The connection node pair determination subunit 1132 is configured to use, as a connection node pair, two nodes in the node k, the intermediate node, and the association node having an edge, and acquire an edge weight corresponding to the connection node pair.
The jump probability determination subunit 1133 is configured to determine a jump probability between the node k and the association node according to the edge weight corresponding to the connection node pair.
For the implementations of the intermediate node acquisition subunit 1131, the connection node pair determination subunit 1132, and the jump probability determination subunit 1133, for example, reference may be made to the description of operation S101 in the embodiment corresponding to
Referring to
The node edge updating subunit 1141 is configured to update a connected edge in the relationship topology graph according to the node k and the association node, to obtain a transition relationship topology graph. The node k and the association node in the transition relationship topology graph are both connected with edges.
The edge weight setting subunit 1142 is configured to set, to an edge weight between the node k and the association node, the jump probability between the node k and the association node in the transition relationship topology graph, to obtain a target relationship topology graph.
The target user set determination subunit 1143 is configured to determine the target user set from the target relationship topology graph.
The target user set determination subunit 1143 is further configured to perform exponential growth on the jump probability, perform probability transformation on the jump probability obtained after the exponential growth, to obtain a target probability, and update the edge weight between the node k and the association node according to the target probability.
The target user set determination subunit 1143 is further configured to determine, as a vital association node of the node k, the association node having the updated edge weight greater than a weight threshold.
The target user set determination subunit 1143 is further configured to divide the target relationship topology graph into at least two community topology graphs according to the node k and the vital association node, and acquire a target community topology graph from the at least two community topology graphs as the target user set.
For the implementations of the node edge updating subunit 1141, the edge weight setting subunit 1142, and the target user set determination subunit 1143, for example, reference may be made to the description of operation S101 in the embodiment corresponding to
Referring to
The first related user determination unit 141 is configured to determine, from the to-be-confirmed users, the user having a social relationship with the abnormal user in a case that the status of the target user set is the abnormal state.
The first diffusion-abnormal user determination unit 142 is configured to determine, as the diffusion-abnormal user, the user having the social relationship with the abnormal user.
For the implementations of the first related user determination unit 141 and the first diffusion-abnormal user determination unit 142, for example, reference may be made to the description of operation S104 in the embodiment corresponding to
Referring to
The second related user determination unit 143 is configured to determine, from the to-be-confirmed users, the user having a social relationship with the abnormal user in a case that the status of the target user set is the abnormal state.
The second diffusion-abnormal user determination unit 144 is configured to acquire abnormal user nodes corresponding to the abnormal users, acquire association user nodes corresponding to the users having the social relationship with the abnormal user, determine, as a diffusion-abnormal node, the association user node having the edge weight with one of the abnormal user nodes greater than an association threshold, and determine the user corresponding to the diffusion-abnormal node as the diffusion-abnormal user.
For the implementations of the second related user determination unit 143 and the second diffusion-abnormal user determination unit 144, for example, reference may be made to the description of operation S104 in the embodiment corresponding to
Referring to
The to-be-identified user set determination module 15 is configured to determine the target user set in the abnormal state as a to-be-identified user set.
The key text data extraction module 16 is configured to acquire user text data of users in the to-be-identified user set, and extract key text data from the user text data.
The sensitive source data acquisition module 17 is configured to acquire sensitive source data.
The anomaly category determination module 18 is configured to match the key text data with the sensitive source data, and determine an anomaly category of the to-be-identified user set according to a matching result.
For the implementations of the to-be-identified user set determination module 15, the key text data extraction module 16, the sensitive source data acquisition module 17, and the anomaly category determination module 18, for example, reference may be made to the descriptions of operation S201 to operation S204 in the embodiment corresponding to
According to the embodiments of this disclosure, the target user set is acquired, and the target user set includes at least two users having the social relationship. The default abnormal user is acquired, and the abnormal users in the target user set are determined according to the default abnormal user. The status of the target user set is determined according to the abnormal user. The diffusion-abnormal user is identified from the to-be-confirmed users according to the social relationships between the abnormal users and the to-be-confirmed users in the target user set in a case that the status of the target user set is an abnormal state. The to-be-confirmed users are users in the target user set other than the abnormal users. It can be learned from the above that, in the dividing the users having the social relationships into the target user set, in a case that the abnormal users in the target user set are determined and the target user set is in the abnormal state, the users having the social relationship with the abnormal user may be acquired from the target user set and are directly used as the diffusion-abnormal user without performing feature matching on each user. The identification of the diffusion-abnormal user can be performed by using the social relationship. Therefore, even if the diffusion-abnormal user has the same features as the normal user, the diffusion-abnormal user can still be identified because the diffusion-abnormal user has the social relationship with the abnormal user, thereby improving the accuracy of identification.
In the computer device 1000 shown in
It is to be understood that the computer device 1000 described in this embodiment of this disclosure can implement the descriptions of the video data processing method in the foregoing embodiment corresponding to
In addition, embodiments of this disclosure further provide a computer readable storage medium. The computer readable storage medium stores a computer program executed by the data processing computer device 1000 mentioned above, and the computer program includes program instructions. When executing the program instructions, the processor can perform the descriptions of the data processing method in the foregoing embodiments corresponding to
The computer-readable storage medium may be the data identification apparatus according to any one of the foregoing embodiments or an internal storage unit of the foregoing computer device, for example, a hard disk or an internal memory of the computer device. The computer-readable storage medium may also be an external storage device, for example, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, and the like equipped on the computer device. Further, the computer-readable storage medium may further include both the internal storage unit and the external storage device of the computer device. The computer-readable storage medium is configured to store a computer program and other programs and data required by the computer device. The computer-readable storage medium may further be configured to temporarily store data that has been outputted or that is to be outputted.
In the specification, claims, and accompanying drawings of the embodiments of this disclosure, the terms “first” and “second” are intended to distinguish between different objects but do not indicate a particular order. In addition, the terms “include” and any variant thereof are intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, product, or device that includes a series of steps or units is not limited to the listed steps or modules, but further optionally includes a step or module that is not listed, or further optionally includes another step or unit that is intrinsic to the process, method, apparatus product, or device.
A person of ordinary skill in the art may further realize that, in combination with the embodiments herein, units and algorithm, steps of each example described can be implemented with electronic hardware, computer software, or the combination thereof. In order to clearly describe the interchangeability between the hardware and the software, compositions and steps of each example have been generally described according to functions in the foregoing descriptions. Whether the functions are executed in a mode of hardware or software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it is not to be considered that the implementation goes beyond the scope of this disclosure.
The method and the related apparatus provided in the embodiments of this disclosure are described with reference to the method flowcharts and/or schematic structural diagrams provided in the embodiments of this disclosure. For example, each flow and/or block in the method flowchart and/or schematic structural diagram and a combination of processes and/or blocks in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions may be provided to a general-purpose computer, a special-purpose computer, an embedded processor, or a processor of another programmable data processing device to generate a machine, so that an apparatus configured to implement functions specified in one or more procedures in the flowcharts and/or one or more blocks in the schematic structural diagrams is generated by using instructions executed by the general-purpose computer or the processor of another programmable data processing device. These computer program instructions may alternatively be stored in a computer-readable memory that can instruct a computer or another programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more procedures in the flowcharts and/or in one or more blocks in the schematic structural diagrams. These computer program instructions may also be loaded into a computer or another programmable data processing device, so that a series of operation steps are performed on the computer or another programmable data processing device to generate processing implemented by a computer, and instructions executed on the computer or another programmable data processing device provide steps for implementing functions specified in one or more procedures in the flowcharts and/or one or more blocks in the schematic structural diagrams.
According to the example embodiments, the target user set is acquired, and the target user set includes at least two users having the social relationship. The default abnormal user is acquired, and the abnormal users in the target user set are determined according to the default abnormal user. The status of the target user set is determined according to the abnormal user. The diffusion-abnormal user is identified from the to-be-confirmed users according to the social relationships between the abnormal users and the to-be-confirmed users in the target user set in a case that the status of the target user set is an abnormal state. The to-be-confirmed users are users in the target user set other than the abnormal users. According to example embodiments of the disclosure, in the dividing the users having the social relationships into the target user set, in a case that the abnormal users in the target user set are determined and the target user set is in the abnormal state, the users having the social relationship with the abnormal user may be acquired from the target user set and are directly used as the diffusion-abnormal user without performing feature matching on each user. The identification of the diffusion-abnormal user may be performed by using the social relationship. Therefore, even if the diffusion-abnormal user has features similar to the normal user, the diffusion-abnormal user may still be identified because the diffusion-abnormal user has the social relationship with the abnormal user, thereby improving the accuracy of identification.
What is disclosed above is merely exemplary embodiments of this disclosure, and is not intended to limit the scope of the claims of this disclosure. Therefore, equivalent variations made in accordance with the claims of this disclosure shall fall within the scope of this disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202010086855.6 | Feb 2020 | CN | national |
This application is a continuation application of International Application No. PCT/CN2020/126055, filed on Nov. 3, 2020, which is based on and claims priority to Chinese Patent Application No. 202010086855.6, filed with the China National Intellectual Property Administration on Feb. 11, 2020, the entire contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/126055 | Nov 2020 | US |
Child | 17672814 | US |