The present disclosure relates to a method and a system for behavior vectorization of information de-identification, and more particularly to a method for representing the network user and in a de-identified and vectorized form, so as to vectorize and group the behavior of the network user.
With the emergence of the Internet information age, user data can be obtained from multiple sources. It is no longer necessary to spend a lot of effort to search for available resources as in the past. However, such a convenient search mode also brings many problems, such as the problem with the protection of personal information, especially personally identifiable information. For example, the user's name, phone number, email, home address, etc., can easily flow to the Internet due to careless use or wrong operation and can be illegally used by those who are interested therein. Therefore, many network users refuse to disclose their personal information and basic details in order to protect themselves. However, for the advertising companies and online marketers, if the personal information or the basic data of the network users cannot be obtained, the efficiency of their marketing will be significantly reduced. As a result, accurate advertisement placement rates will be dropped such that sales to similar customer groups cannot be accurately performed. Therefore, how to analyze network users and to perform follow-up operations on the analyzed network user information without the violation of the protection of personal information has become a technical threshold that must be crossed. It is disclosed in TWI611362B (Title: “Personalized internet marketing recommendation method”) that the process that the user has experienced can be employed for analysis. Meanwhile, the similar groups can be found through quick grouping. Moreover, it is disclosed in CN109583920A (Title: “Method and management system for generating personalized consumption information”) that a quick grouping can be achieved by use of the process that the user has experienced. Accordingly, the similar groups can be searched based thereon. Also, it is possible to use machine learning methods such as deep learning to improve the system. Other disclosures of the prior art are provided as follows:
(1) TW202020771A “System and method for analyzing the network user behavior and presenting the result thereof”
(2) TW202025039A “Smart marketing advertising classification system”
(3) US20200160388A1 “Cryptographic anonymization for Zero-Knowledge Advertising Methods, Apparatus, and System”
(4) US20140122493A1 “Ecosystem method of aggregation and search and related techniques”
(5) JPA 2019219764 “Information Search System”
(6) JPA 2020184198 “Information processing equipment and information processing program”
According to the above-mentioned prior art, in order to solve the problem of personal information, marketers or online user behavior analysts start to collect users' browsing paths on the Internet and websites, analyze their browsing paths and then classify and group them, and finally employ the results of the classification and grouping for the purpose of advertising, marketing, etc. However, network users use multiple paths. Meanwhile, slightly different website stay time, click behaviors, operations, trigger events, etc., may change the analysis results. Furthermore, as for the use of machine learning for path learning analysis, it is likely to happen that the analysis results are distorted and useless once the path is not defined. How to make the path more clearly to represent the network user or even to describe the network user by the path, is a problem to be solved.
It is a primary object of the present disclosure to provide a method and a system for behavior vectorization of information de-identification that can de-identify information and convert the path of network users in a vectorized form for grouping purpose.
According to the present disclosure, a server retrieves the data that is not personal information, such as the browsing traces, paths, the course, the trigger event, and the click operation of the network users on the Internet. The large amount of data is stacked, integrated, and then converted into a vector matrix. The vector matrix is employed to represent the profile, characteristics, identification code, consumption characteristics of the network users, etc., which can represent the data of the network users. The server can quickly group and classify the vector matrix, and then find similar groups to quickly identify network users. In addition, vector conversion, grouping and classification are defined and classified by the data provider, which pre-defines and classifies the network usage paths of past network users. The server is trained with machine learning based on the supervised learning method. After the machine learning is completed, the retrieved data can be stacked and vectorized. Meanwhile, the vector matrix can be classified after vectorization. The aforementioned vectorization can also be performed on the client side, such as: browsers, web pages, mobile devices, wearable devices, car appliances, Internet of Things, POS, etc., or Edge Server, or any combination of conversion calculations and aggregation so that the server can save costs and perform subsequent quick classification. The server employs the supervised learning method as a base method, and uses pre-defined network behaviors for training. Meanwhile, semi-supervised or unsupervised learning can also be employed as another base method. The degree of correlation can be inferred through continuous behavior for training. Also, semi-supervised learning method or unsupervised learning method can be used to provide feedback to the operations and the use of the network users with respect to the undefined network behaviors, so that the model can be re-learned and modified to better conform to the profile description of network users.
Referring to
The server 11 establishes an information link with the data provider device 12 and the client device 13. The server 11 can receive a learning training sample provided by the data provider device 12 and build a machine learning model based on the learning training sample provided by the data provider device 12. The model can mainly retrieve network usage paths of the client device 13 for stacking and vectorization, and then group and classify the vectorized data.
The data provider device 12 can be a search engine database or a data database. Any device that enables the server 11 to obtain the required learning and training samples can be employed.
The client device 13 can be one of a mobile phone, a tablet computer, a personal computer, etc. Any device that enables the server 11 to obtain the required samples to be tested, can be employed.
The client device 13 is operated by a client. The client can use the Internet through the client device 13, and the server 11 can retrieve the Internet path used by the client device 13. The client of the client device 13 mainly refers to a network user, but it is not limited thereto.
The server 11 mainly includes a data processing module 111, a data storage module 112, a vectorization module 113, and a grouping/classifying module 114 which establish an information link with each other. The data processing module 111 is used to run the server 11 and to drive the modules connected thereto. The data processing module 111 fulfills functions such as logic operations, temporary storage of operation results, and storage of execution instruction positions. It can be, for example, a CPU, but is not limited thereto.
The data storage module 112 can store electronic data, which can be, for example, a Solid State Disk or Solid State Drive (SSD), a Hard Disk Drive (HDD), a Static Random Access Memory (SRAM), or a Random Access Memory (DRAM), etc. The data storage module 112 mainly stores path vector learning data and vector grouping learning data transmitted by the data provider device 12, path data transmitted by the client device 13, and data calculated and processed by the server 11.
The vectorization module 113 mainly performs training and learning for the path vector learning data provided by the data provider device 12. After the training and learning are completed, the vectorization module 113 can convert the path data transmitted by the client device 13 into vectorized data. The training and learning of the vectorization module 113 mainly use machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The above-mentioned path vector learning data can be a plurality of past path data and a plurality of past vectorized data. The past path data and the path data can be any data of a website trigger event, a website click event, a website operation behavior, a website stay time, or a combination thereof. Any data referring to the visiting traces on the Internet is applicable. The past vectorized data mainly correspond to the past path data, and are used for training and learning by the vectorization module 113. The vectorized data can be one of two-dimensional matrix vector, three-dimensional matrix vector, or multi-dimensional matrix vector. The vectorization module 113 mainly stacks and converts each one-dimensional data in the path data into the vectorized data. For example, a network user of the client device 13 stays on a website A for 5 minutes and 30 seconds, clicks on three products, and each is linked to other external websites corresponding to the three products, then returns back to the website A. Meanwhile, the network user watches advertisements A, B, C on the website A for 15 seconds, respectively. In this case, a matrix of the client device 13 can be provided by the vectorization module 113 and defined to be: [0.33, 3, 0.45] ([total stay time, number of products clicked, total time to watch advertisements]). The above-mentioned case is only an example, but should not limited thereto. After the vectorization module 113 converts the path data into the vectorized data, it can be stored in the data storage module 112 or transmitted to the subsequent grouping/classifying module 114.
The grouping/classifying module 114 can perform training and learning for the vector grouping learning data provided by the data provider device 12. After the training and learning are completed, the grouping/classifying module 114 can assign a grouping result to the vectorized data transmitted by the vectorization module 113. The grouping/classifying module 114 can group and classify the vectorized data transmitted by the vectorization module 113. The training and learning of the grouping/classifying module 114 mainly uses machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto. The vector grouping learning data include mainly a plurality of the past vectorized data and a past grouping data. The past grouping data can include a plurality of the past vectorized data of the aforementioned past network users for training and learning by the grouping/classifying module 114. Moreover, the grouping result can be a group or set containing a plurality of vectorized data representing network users.
As illustrated in
(1) Step S1 of providing data by a data provider:
As shown in
(2) Step S2 of training a model:
After the vectorization module 113 receives the path vector learning data D1 transmitted by the data provider device 12 and the vector grouping learning data D2 of the grouping/classifying module 114, the vectorization module 113 uses the path vector learning data D1 as the past data to perform a first machine learning. The grouping/classifying module 114 uses the vector grouping learning data D2 as the past data to perform a second machine learning. The first and the second machine learning mainly refer to the machine learning such as supervised learning, semi-supervised learning, reinforcement learning, unsupervised learning, self-supervised learning or heuristic algorithms, but not limited thereto.
(3) Step S3 of retrieving path data of the network users:
Following the above-mentioned steps and referring to
(4) Step S4 of vectorizing path data:
Referring to
(5) Step S5 of vectorizing and grouping:
Following the above-mentioned steps and referring to
Referring to
In the step S3 of retrieving path data of the network users and in the step S4 of vectorizing path data, the server 11 may further transmit the result of the first machine learning to the client device 13. After receiving the result of the first machine learning, the client device 13 can retrieve the path data D3 of the client device 13 in real time. Meanwhile, the path data D3 are converted into vectorized data D4, and then the vectorized data D4 are transmitted to the server 11.
Referring to
In summary, the present disclosure is mainly based on machine learning. Without the need to obtain the personal information of the network user, the path of the network users on the Internet is vectorized and grouped. Meanwhile, the network users are identified according to the grouping results for facilitating the subsequent processing and use. The present invention can indeed provide a behavior vectorization method that de-identifies information, converts the path of network users in a vectorized way, and then de-identifies grouped information.
Number | Date | Country | Kind |
---|---|---|---|
110113471 | Apr 2021 | TW | national |