The invention is in the field of communication platforms. More specifically, the invention relates to a system for generating fabricated pattern data records.
In the modern world, the use of communication networks is almost inevitable, weather if it is cellular communication or computers communication or any other communication platforms. The communication companies have therefore, records about every user. From these records a lot of information can be derived; however, using real data of any communication network has always caused a serious privacy problem. General speaking, the main problem is connected by the need for protecting the users from unsupervised monitoring. Nevertheless, the need of knowing the nature of data can yield significant value for many applications (marketing, sales, and customer service).
Other systems are also capable of collecting various types of user data or all sorts of data. These systems may include social web sites, search engines and wearable computing. This huge amount of collected data may be used for all sorts of analysis. Commercial data may also be beneficial for such analysis. However, privacy and security must be kept while using this data.
It is therefore an object of the present invention to provide a system for generating fabricated pattern data records, based on modeling of actual networks of various types.
It is another object of the present invention to provide a system for generating fabricated pattern data records, which keeps the users' privacy and the security of the collected data.
Further purposes and advantages of this invention will appear as the description proceeds.
The present invention is directed to a system for generating fabricated pattern data records (XDRs) based on data from accessible data sources, which may comprise:
The modeling and pattern creation modules may use Model and Patterns Creation algorithms (MPCs), which can discover patterns that reflect the relationships, conditions and constants of the available data.
The modeling tasks may include:
The synthetic data generation modules may use Syntactic Data Production (SDP) algorithms to generate new and fabricated data samples utilizing the models learned by the MPCs.
The system may further comprise a Query API and a Query Processer to receive and process data-generation queries, as well as a query cache for caching queries and query results and a User Interface for allowing interaction with the XDR core module and server-side components. The data sources may be located locally on the computerized device that runs the data fabrication system, or on an external computerized device.
The data splitting module may split the data into training and testing sets by using random based or time based splitting. Data may be aggregated and prepared for further usage.
All the above and other characteristics and advantages of the invention will be further understood through the following illustrative and non-limitative description of embodiments thereof, with reference to the appended drawings. In the drawings the same numerals are sometimes used to indicate the same elements in different drawings.
The present invention relates to a system which is capable of fabricating synthetic data (hereinafter called X Data Records or XDRs) from any available data source, while providing a high degree of similarity between the original data and the synthetic data. An XDR may be any type of data record, such as Call Data Record (CDR), data regarding operations performed by a user, purchasing records of users at points of sale, traveling records etc.
System 10 includes an XDR core module 13, which is the main component and contains the following sub-components:
System 10 also includes XDR agents 14, which are software components that communicate with the data sources and access the relevant data. Each XDR agent 14 knows the unique APIs of each data-store, as well as the data-structures. It also knows how to transform these data structures into a unified input structure that is compatible with the algorithms used by the XDR core module 13. A specific XDR agent 14 must be implemented for each target data source 11.
A data-store communication module 15 is responsible for mediating between the XDR agents 14 and the XDR core module 13 by using a flexible method of transforming data from the XDR agents 14 into the XDR storage database 13d.
Splitting the data by the data splitting module 13c is required in order to evaluate the generated models in the next section. Models are trained solely on the training set, while the testing set is utilized to evaluate the generated models using maximum likelihood estimation. The data splitting module 13c is controlled by the splitting criteria (i.e., random-based or time-based) and the splitting ratio (typically a ratio of 90:10).
The Model and Patterns Creation (MPC) are sets of algorithms (used by the modeling and pattern creation modules 13a) which are capable of discovering patterns that reflect the relationships, conditions and constants of the available data. He modeling tasks may be context-aware learning, learning of state-transitions of a system or an individual, or learning probabilistic cause-effect conditions among a given set of random variables. All the learning models have the capability of updating the generated models incrementally, using batch updates.
Examples for data modeling tasks can be:
Examples for learning cause effect between ransom variables can be:
The purpose of the Data Market Place 21 is to provide a platform for creating, offering and obtaining synthetic data. It is targeted at Business Partners and interested instances in general. Users can request synthetic data samples in different volumes from multiple domains according to specific queries. Synthetic records can be marked as public or private for certain users groups.
The dashboard 22 is graphic user interface component, which expose the functionality of the Data market both for admin users and clients.
The Syntactic Data Production (SDP) algorithms (used by synthetic data generation modules 13b) are used to generate new and fabricated data samples utilizing the models learned by the MPCs. The syntactic data production algorithm store is required to support several types of algorithms, since each of the pattern creation models may have a unique model representation.
A Query API 16 and a Query Processer 17 receive data-generation queries, processes them, requests the generation from the XDR server (which runs the XDR core module 13), aggregates the data if needed, and finally returns it to the client. The query cache 18 allows queries and query results to be cached in order to accelerate response times.
The queried data can also be re formatted, filtered and processed beyond the query agent via the Synth Agent 20 (in a sense the Synth Agent on the output end is analogous to the XDR Agent on the input end), which is adapted to convert and aggregate tasks that are performed on the fabricated data. For example, it can calculate the estimated number of users on a specific cell-tower at a given time-stamp according the fabricated data of user transitions, which has been modeled by the system.
The XDR administration console 19 is a User Interface (such as a GUI) that enables interaction with the XDR core module 13 and other server-side components.
Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims
Number | Date | Country | Kind |
---|---|---|---|
236557 | Jan 2015 | IL | national |