Embodiments of the present disclosure relate to the field of computer science technologies, and in particular, to a method and an apparatus for mining a social relationship based on financial data.
Currently, competition in the banking industry is very fierce. Continuously increasing a quantity of customers is the only road for the survival of a bank. The booming development of Internet finance has great impact on conventional banks. For example, Yu'e Bao, a financial product platform developed by Alibaba, has raised 5.7 billion renminbi (RMB) in only 18 days, and over 50 billion RMB in three months since its launch. How to detain existing customers, attract new customers, and distinguish high quality customers becomes a key to improving bank profits.
The conventional discovery of a social relationship between bank customers mainly relies on content written in an application form by a customer when the customer applies for a bank card, for example, finding out a colleague relationship through a collecting person, and finding out a family relationship through a main credit card and an attached credit card, or a loan guarantee.
However, efficiency of determining a social relationship between bank customers using this method is too low.
Embodiments of the present disclosure provide a method and an apparatus for mining a social relationship based on financial data, to overcome a problem in the prior art that efficiency of identifying, based on a simple rule, a social relationship between bank customers is low.
A first aspect of the present disclosure provides a method for mining a social relationship based on financial data, including acquiring financial transaction data of a client user; determining a financial transaction network according to the financial transaction data; determining a network topology attribute of the client user and a non-network topology attribute of the client user according to the financial transaction network; and determining, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client user.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the financial transaction data of the client user includes an attribute of the client user, a transaction behavior of the client user, a fund flow of the client user, a fund amount of the client user, and a transaction time, a transaction type, and a transaction memo of the client user; and the determining a financial transaction network according to the financial transaction data includes determining nodes of the financial transaction network according to the client user, determining a node attribute of the financial transaction network according to the attribute of the client user, determining edges of the financial transaction network according to the transaction behavior of the client user, where the nodes are connected using the edges, determining directions of the edges according to the fund flow of the client user, determining weights of the edges of the financial transaction network according to the fund amount of the client user, and determining attributes of the edges of the financial transaction network according to the transaction time, the transaction type, and the transaction memo of the client user.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the financial transaction data includes first data and second data, where the first data refers to a client user whose social relationship is annotated and the second data refers to a client user whose social relationship is not annotated; and the determining, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client user includes determining a classification model according to a network topology attribute and a non-network topology attribute of the first data; and acquiring, according to the classification model, a social relationship of a client user corresponding to the second data.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the determining a classification model according to a network topology attribute and a non-network topology attribute that correspond the first data includes selecting an attribute according to the network topology attribute of the financial transaction network and the non-network topology attribute; determining a training data set and a test data set according to the first data; constructing the classification model according to the training data set and the attribute using a data mining classification algorithm; and testing, according to the test data set, whether the classification model passes a model assessment.
With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, acquiring a social relationship of data in the test data set using the classification model, and calculating a match rate between the acquired social relationship of the data in the test data set and an annotated social relationship of the data in the test data set; and if the match rate is higher than a first threshold, determining that the classification model passes the model assessment; or if the match rate is not higher than the first threshold, continuing training the classification model.
With reference to the first aspect or any one of the first to the fourth possible implementation manners of the first aspect, in a fifth possible implementation manner of the first aspect, the determining, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client includes performing network clustering according to the topology attribute of the financial transaction network and the non-network topology attribute, to acquire the social relationship of the client user.
A second aspect of the present disclosure provides an apparatus for mining a social relationship based on financial data, including an acquiring module configured to acquire financial transaction data of a client user; a first determining module configured to determine a financial transaction network according to the financial transaction data acquired by the acquiring module; a second determining module configured to determine a network topology attribute of the client user and a non-network topology attribute of the client user according to the financial transaction network determined by the first determining module; and a third determining module configured to determine, according to a topology attribute of the financial transaction network and the non-network topology attribute that is determined by the second determining module, a social relationship corresponding to the client user.
In a first possible implementation manner of the second aspect, the first determining module is configured to the financial transaction data of the client user includes an attribute of the client user, a transaction behavior of the client user, a fund flow of the client user, a fund amount of the client user, and a transaction time, a transaction type, and a transaction memo of the client user; and determine nodes of the financial transaction network according to the client user, determine a node attribute of the financial transaction network according to the attribute of the client user, determine edges of the financial transaction network according to the transaction behavior of the client user, where the nodes are connected using the edges, determine directions of the edges according to the fund flow of the client user, determine weights of the edges of the financial transaction network according to the fund amount of the client user, and determine attributes of the edges of the financial transaction network according to the transaction time, the transaction type, and the transaction memo of the client user.
With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the financial transaction data includes first data and second data, where the first data refers to a client user whose social relationship is annotated and the second data refers to a client user whose social relationship is not annotated; and the third determining module includes a model determining unit and a relationship determining unit, where the model determining unit is configured to determine a classification model according to a network topology attribute and a non-network topology attribute of the first data; and the relationship determining unit is configured to acquire, according to the classification model determined by the model determining unit, a social relationship of a client user corresponding to the second data.
With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the model determining unit is configured to select an attribute according to the network topology attribute of the financial transaction network and the non-network topology attribute; determine a training data set and a test data set according to the first data; construct the classification model according to the training data set and the attribute using a data mining classification algorithm; and test, according to the test data set, whether the classification model passes a model assessment.
With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner of the second aspect, the model determining unit is configured to acquire a social relationship of data in the test data set using the classification model, and calculate a match rate between the acquired social relationship of the data in the test data set and an annotated social relationship of the data in the test data set; and if the match rate is higher than a first threshold, determine that the classification model passes the model assessment; or if the match rate is not higher than the first threshold, continue training the classification model.
With reference to the second aspect or any one of the first to the fourth possible implementation manners of the second aspect, in a fifth possible implementation manner of the second aspect, the third determining module is configured to perform network clustering according to the topology attribute of the financial transaction network and the non-network topology attribute, to acquire the social relationship of the client user.
According to the method and the apparatus for mining a social relationship based on financial data in the embodiments of the present disclosure, a financial transaction network is constructed using financial transaction data, a network topology attribute of a client user and a non-network topology attribute of the client user are determined according to the financial transaction network, a classification model is constructed according to the network topology attribute and the non-network topology attribute, colleague and non-colleague relationships and family and non-family relationships that correspond to the client are determined using the classification model, cluster analysis is performed on a calculation result of the network topology attribute and the non-network topology attribute, and a friend relationship corresponding to the client user is determined, thereby resolving problems in the prior art that efficiency of determining a social relationship between the client users is low and the social relationships of the client user are not totally discovered.
To describe the technical solutions in the embodiments of the present disclosure more clearly, the following briefly describes the accompanying drawings required for describing the embodiments. The accompanying drawings in the following description show some embodiments of the present disclosure, and persons of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.
To make the objectives, technical solutions, and advantageous effect of the embodiments of the present disclosure clearer, the following clearly describes the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings in the embodiments of the present disclosure. The described embodiments are a part rather than all of the embodiments of the present disclosure. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present disclosure without creative efforts shall fall within the protection scope of the present disclosure.
Step 101: Acquire financial transaction data of a client user.
The financial transaction data of the client user is acquired from a transaction record of the client user. The transaction record may be a transfer transaction of the client user, or may be a consumption transaction of the client user. The financial transaction data acquired from the transaction record not only includes a time of this transaction, but also includes transaction attributes such as a transaction location and a transaction amount. In addition, the transaction record further records personal information of the client user corresponding to this transaction. The financial transaction data includes financial transaction data whose social relationship, such as a colleague or family relationship, of the client user is annotated and financial transaction data whose social relationship is not annotated.
Step 102: Determine a financial transaction network according to the financial transaction data.
An overall process in which a server constructs the financial transaction network according to the financial transaction data mainly includes the following several steps: first, storage of a big data database, where a large amount of transaction records are stored into a database Hive; second, address mapping of a client user, where an address may be a network identifier (ID) or an external ID of the client user, and secondary mapping is performed on the client user ID according to database data, such as Hive data, thereby ensuring uniqueness of the corresponding client user ID in a process of constructing a network, and also decreasing space occupied by a network file; third, characteristic selection, where characteristic selection is performed according to the financial transaction data, to determine a time interval of constructing a network and attribute information that needs to be reflected on the network; fourth, weight calculation, where weight calculation of edges in the financial transaction network is determined according to a calculation result of the characteristic selection, for example, if a quantity of transaction times is selected as the weight, transaction records of client users having a same quantity of transaction times are analyzed using the database Hive; fifth, completing sorting external IDs by means of the Hive data, using sorted data as a data input for network construction, and implementing construction of a general network construction file .net using a network construction program. Sorted data is used as an input file for network construction, to perform network construction, so that time complexity of the construction process can be decreased. For a problem that a construction time of a network having a large amount of data is long, in this embodiment, sorting and mapping for network construction are completed based on the big data database, thereby improving the overall construction efficiency.
Step 103: Determine a network topology attribute of the client user and a non-network topology attribute of the client user according to the financial transaction network.
Network data in the financial transaction network can well reflect a relationship and a close degree between the client users, and network topology attributes of different relationships on the financial transaction network are obviously different. For example, a common neighboring node exists between nodes of a colleague relationship, directions and weights between nodes of a family relationship are obviously different from a general transaction record, and the like, which can all be reflected by a network attribute. The network topology attribute calculated in this embodiment mainly includes: Adamic Adar, a common neighbor, a clustering coefficient, a distance, a degree, a page rank, a volume, a Jaccard coefficient, and the like between two nodes. A process of calculating the network topology attribute is shown in
The non-network topology attribute between the client users corresponding to the financial transaction network is mainly from the perspective of a transaction attribute, and non-network attribute design and calculation are performed according to a characteristic of the financial transaction data, mainly including a time dimension, a space dimension, a transaction amount, a transaction flow, and the like. The time dimension is mainly divided into two parts: a week rule and a day rule. The week rule refers to that a quantity of transaction times in a week, that is, seven days, correspondingly form seven non-network attribute characteristics; and the day rule refers to that calculation is performed according to a quantity of transaction times every day, that is, 24 hours, to form 24 non-network attribute characteristics. The space dimension is to collect statistics on an overlapping degree of activity locations of two client users that make a transaction. The transaction amount refers to an amount involved in the transaction between two client users, and may include a yearly total transaction amount, a month average transaction amount, or measurement such as a difference between income and expenditure. The transaction flow is to collect statistics on a fund flow in a transaction record between two client users. For example, if a client user A transfers money to a client user B five times, and the client user B transfers money to the client user A once, a transaction flow attribute value between the client user A and the client user B is four times.
The non-network topology attribute in this embodiment has great clustering function for client users having a similar background, and has a great distinguishing function for client users having different backgrounds. For example, for a transaction location, most client users in a same area choose to make a transaction at a same online store, and for a transaction time, client users making a transaction at working hours are mainly office workers.
Step 104: Determine, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client user.
In this embodiment, there are two methods for determining, according to the topology attribute of the financial transaction network and the non-network topology attribute, the social relationship corresponding to the client user.
The financial transaction data includes first data and second data, where the first data refers to data whose social relationship is annotated and the second data refers to data whose social relationship is not annotated.
Optionally, the determining, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client user includes determining a classification model according to a network topology attribute and a non-network topology attribute of the first data; and acquiring, according to the classification model, a social relationship of a client user corresponding to the second data.
Optionally, the determining, according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client includes performing network clustering according to the topology attribute of the financial transaction network and the non-network topology attribute, to acquire the social relationship of the client user.
Further, the determining a classification model according to a network topology attribute and a non-network topology attribute that correspond to the first data includes selecting an attribute according to the network topology attribute of the financial transaction network and the non-network topology attribute; determining a training data set and a test data set according to the first data; constructing the classification model according to the training data set and the attribute using a data mining classification algorithm, where a common data mining classification algorithm includes a decision tree algorithm, a random forest algorithm, and the like; and testing, according to the test data set, whether the classification model passes a model assessment.
Further, the testing, according to the test data set, whether the classification model passes a model assessment includes acquiring a social relationship of data in the test data set using the classification model, and calculating a match rate between the acquired social relationship of the data in the test data set and an annotated social relationship of the data in the test data set; and if the match rate is higher than a first threshold, determining that the classification model passes the model assessment; or if the match rate is not higher than the first threshold, continuing training the classification model.
The server determines colleague and non-colleague relationships and family and non-family relationships that correspond to the client according to a calculation result of the topology attribute of the financial transaction network and the non-network topology attribute using the classification model; and acquires a friend relationship of the client user by means of network clustering. The classification model is determined according to a data set that is obtained after calculation of the network topology attribute of the financial transaction network and the non-network topology attribute. A process of constructing the classification model of this embodiment is shown in
The network clustering method is a community discovery method. A community phenomenon is a common phenomenon in a complex network and displays a community characteristic owned by multiple individuals. The community discovery method is a method used to mine the community characteristic occupied by the multiple individuals. First, a constructed financial transaction network is used as an input of a discovery community calculation model. Then, a server performs processing and preliminary clustering of communities using large-scale network analysis software. Lastly, secondary analysis is performed on a preliminary clustering result, to acquire a community structure of a client user, where the community structure is a friend circle of the client user, and a friend relationship between client users is annotated according to the friend circle.
Further, determining, by the server, a financial transaction network according to the financial transaction data includes determining nodes of the financial transaction network according to the client user, determining a node attribute of the financial transaction network according to the attribute of the client user, determining edges of the financial transaction network according to the transaction behavior of the client user, where the nodes are connected using the edges, determining directions of the edges according to the fund flow of the client user, determining weights of the edges of the financial transaction network according to the fund amount of the client user, and determining attributes of the edges of the financial transaction network according to the transaction time, the transaction type, and the transaction memo of the client user.
In this embodiment, the financial transaction data is used for experiment, to construct a colleague and non-colleague classification model and a family relationship model of a client user. An experiment result is shown in Table 1.
As shown in
In the foregoing embodiment, the financial transaction data includes first data and second data, where the first data refers to a client user whose social relationship is annotated and the second data refers to a client user whose social relationship is not annotated; and the third determining module includes a model determining unit 105 configured to determine a classification model according to a network topology attribute and a non-network topology attribute of the first data; and a relationship determining unit 106 configured to acquire a social relationship of a client user corresponding to the second data according to the classification model determined by the model determining unit.
The model determining unit 105 is configured to select an attribute according to the network topology attribute of the financial transaction network and the non-network topology attribute; determine a training data set and a test data set according to the first data; construct the classification model according to the training data set and the attribute using a data mining classification algorithm; and test, according to the test data set, whether the classification model passes a model assessment.
The model determining unit 105 is configured to acquire a social relationship of data in the test data set using the classification model, and calculate a match rate between the acquired social relationship of the data in the test data set and an annotated social relationship of the data in the test data set; and if the match rate is higher than a first threshold, determine that the classification model passes the model assessment; or if the match rate is not higher than the first threshold, continue training the classification model.
The third determining module 104 is configured to perform network clustering according to the topology attribute of the financial transaction network and the non-network topology attribute, to acquire the social relationship of the client user.
The financial transaction data of the client user includes an attribute of the client user, a transaction behavior of the client user, a fund flow of the client user, a fund amount of the client user, and a transaction time, a transaction type, and a transaction memo of the client user; and the first determining module 102 is configured to determine nodes of the financial transaction network according to the client user, determine a node attribute of the financial transaction network according to the attribute of the client user, determine edges of the financial transaction network according to the transaction behavior of the client user, where the nodes are connected using the edges, determine directions of the edges according to the fund flow of the client user, determine weights of the edges of the financial transaction network according to the fund amount of the client user, and determine attributes of the edges of the financial transaction network according to the transaction time, the transaction type, and the transaction memo of the client user.
The apparatus in this embodiment may be used to execute the technical solution of the method embodiment shown in
The bus 204 may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, an inter-integrated circuit (I2C) bus, or the like. The bus 204 may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the bus in
The memory 203 is configured to store executable program code, where the program code includes a computer operation instruction. The memory 203 may be a volatile memory, such as a random-access memory (RAM), or may be a non-volatile memory (NVM), such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid-state drive (SSD).
The processor 201 may be a central processing unit (CPU).
The processor 201 may invoke the operation instruction and the program code that are stored in the memory 203, to execute the processing method provided in this embodiment of the present disclosure. The method includes acquiring, by the processor 201, financial transaction data of a client user; determining, by the processor 201, a financial transaction network according to the financial transaction data; determining, by the processor 201, a network topology attribute of the client user and a non-network topology attribute of the client user according to the financial transaction network; and determining, by the processor 201 according to a topology attribute of the financial transaction network and the non-network topology attribute, a social relationship corresponding to the client user.
The processor 201 determines nodes of the financial transaction network according to the client user, determines a node attribute of the financial transaction network according to the attribute of the client user, determines edges of the financial transaction network according to the transaction behavior of the client user, where the nodes are connected using the edges, determines directions of the edges according to the fund flow of the client user, determines weights of the edges of the financial transaction network according to the fund amount of the client user, and determines attributes of the edges of the financial transaction network according to the transaction time, the transaction type, and the transaction memo of the client user.
The processor 201 determines a classification model according to a network topology attribute and a non-network topology attribute of the first data; and the processor 201 acquires, according to the classification model, a social relationship of a client user corresponding to the second data.
The processor 201 selects an attribute according to the network topology attribute of the financial transaction network and the non-network topology attribute; the processor 201 determines a training data set and a test data set according to the first data; the processor 201 constructs the classification model according to the training data set and the attribute using a data mining classification algorithm; and the processor 201 tests, according to the test data set, whether the classification model passes a model assessment.
The processor 201 acquires a social relationship of data in the test data set using the classification model, and calculates a match rate between the acquired social relationship of the data in the test data set and an annotated social relationship of the data in the test data set stored in the memory 203; and if the match rate is higher than a first threshold, determines that the classification model passes the model assessment; or if the match rate is not higher than the first threshold, continues training the classification model.
The processor 201 performs network clustering according to the topology attribute of the financial transaction network and the non-network topology attribute, to acquire the social relationship of the client user.
The apparatus in this embodiment may be used to execute the technical solution of the method embodiment shown in
Persons of ordinary skill in the art may understand that all or some of the steps of the method embodiments may be implemented by a program instructing relevant hardware. The program may be stored in a computer-readable storage medium. When the program runs, the steps of the method embodiments are performed. The foregoing storage medium includes any medium that can store program code, such as a ROM, a RAM, a magnetic disk, or an optical disc.
Finally, it should be noted that the foregoing embodiments are merely intended for describing the technical solutions of the present disclosure, but not for limiting the present disclosure. Although the present disclosure is described in detail with reference to the foregoing embodiments, persons of ordinary skill in the art should understand that they may still make modifications to the technical solutions described in the foregoing embodiments or make equivalent replacements to some or all technical features thereof, without departing from the scope of the technical solutions of the embodiments of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201410085416.8 | Mar 2014 | CN | national |
This application is a continuation of International Application No. PCT/CN2014/089034, filed on Oct. 21, 2014, which claims priority to Chinese Patent Application No. 201410085416.8, filed on Mar. 10, 2014, both of which are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2014/089034 | Oct 2014 | US |
Child | 15251000 | US |