After over two-decades of electronic data automation and the improved ability for capturing data from a variety of communication channels and media, even small enterprises find that the enterprise is processing terabytes of data with regularity. Moreover, mining, analysis, and processing of that data have become extremely complex.
Updating, mining, analyzing, reporting, and accessing the enterprise information can still become problematic because of the sheer volume of this information and because often the information is dispersed over a variety of different file systems, databases, and applications. In fact, the data and processing can be geographically dispersed over the entire globe. When processing against the data, communication may need to reach each node or communication may entail select nodes that are dispersed over the network.
Collecting, indexing, and managing data from a variety of sources and a variety of formats is challenging for any enterprise because data fields in one source may be different or may be associated with one field in another source. To deal with this, enterprises often spend a lot of time and resources to manually analyze the sources of data and to then convert those sources into a normalized format.
Even when the above work is done by an enterprise, the data managed may still not be associated with comprehensive records that avoid duplication. That is, duplication can affect the accuracy of the data and results associated with mining the data. Some enterprises may employ additional resources to ensure that data duplication is detected and corrected. These resources may work full time cleaning data received and processed by an enterprise on a daily basis.
In various embodiments, techniques for data integration are presented. According to an embodiment, a method for data integration is provided.
Specifically, source data is identified and source data attributes present in the source data are mapped to target data attributes in target data. Finally, a profile is created for the source data that defines actions of the mapping.
Initially, it is noted that specific embodiments and sample implementations for various aspects of the invention are provided in detail in the provisional filing (Provisional Application No. 61/788,712), which is incorporated by reference in its entirety herein.
At 110, the data mapper identifies source data. In an embodiment, the source data is identified by a user accessing an interface, such as the interface discussed below and with reference to the
At 120, the data mapper maps source data attributes present in the source data to target data attributes in target data. Again, the target data can be selected by the user via an interface as well, such as an interactive interface. It is noted that in some cases, an automated service can be used to select both the source data and the target data.
The attributes and/or fields associated with the source data and the target data can be identified based on schemas or delimiters in the native data.
At 130, the data mapper creates a profile for the source data that defines actions of the mapping. The profile can be interpreted and used to drive execution that transforms the source data attributes into the target data attributes. In an embodiment, the profile is an executable script.
According to an embodiment, at 140, the data mapper and its processing (110-130) are provided as an interactive user interface (also discussed below with reference to the
In another case, at 150, the data mapper evaluates selection rules when multiple records from the source data appear to be a single record. That is, the groupings of data within the source data appear to be the same or similar.
Continuing with the embodiment of 150 and at 151, the data mapper applies custom user-defined selection rules. Again, the interface discussed above and below can be used to permit the user to interactively custom define the selection rules.
In another case of 150 and at 152, the data mapper applies predefined selection rules. That is, based on the type of source data, type of profile, and/or type of target data a predefined set of selection rules can be used.
For example, at 153, the data mapper applies the predefined selection rules as one of: selection based on a highest score for each of the multiple records, selection based on a most-recent record created from the multiple records, selection based on a particular record last modified from the multiple records, and selection based on prioritization of each of the multiple records.
In an embodiment, at 160, the data mapper reuses the profile when a new instance of the source data is processed. In other words, once the profile is established the entire source data or new instance of the source data can be automatically processed via the profile.
According to an embodiment, at 170, the data mapper processes the profile against the source data and the target data.
Continuing with the embodiment of 170 and at 171, the data mapper merges records in the source data into the target data.
In an embodiment, there is a cross-reference lineage between the source data and the target data within separate storage, such as database storage.
The data integration interface manager presents a processing perspective from an interface (manual and controlled by a user and/or automated application that operates autonomously from any user) that utilizes the data mapper presented above with respect to the
At 210, the data integration interface manager presents source attributes for source data and target attributes defined in target data to a user.
In an embodiment, at 211, the data integration interface manager uses a source schema for presenting the source attributes and a target schema for presenting the target attributes.
At 220, the data integration interface manager records mappings between the source attributes and the target attributes.
According to an embodiment, at 221, the data integration interface manager tracks interface selections or associations made by the user between the source attributes and the target attributes as the mappings.
At 230, the data integration interface manager receives rules for resolving conflicts when transforming the source data to the target data.
In an embodiment, at 231, the data integration interface manager permits the user to custom define rules for merging, selection, and duplication of records associated with the source data.
In another case, at 232, the data integration interface manager provides predefined rules for merging, selection, and duplication based on a profile type associated with the profile.
At 240, the data integration interface manager houses the mappings and the rules with an identifier for the source data and another identifier for the target data as a profile for the transforming.
According to an embodiment, at 241, the data integration interface manager augments the profile with metadata and control data associated with the processing of the profile.
In an embodiment, at 250, the data integration interface manager provides the processing as a graphical user interface to the user.
The data integration system 300 implements, inter alia, the methods 100 and 200 of the
The data integration system 300 includes a data mapper 301 and a data integration interface manager 302.
The data integration system 300 includes a non-transitory computer-readable storage medium having executable instructions for the data mapper 301 that executes on one or more processors of the network. Example processing associated with the data mapper 301 was presented above with respect to the
The data mapper 301 is configured to create a mapping between source attributes for source data and target attributes for target data by monitoring actions of a user accessing an interface presented by the data integration interface manager 302.
The data integration system 300 includes a non-transitory computer-readable storage medium having executable instructions for the data integration interface manager 301 that executes on one or more processors of the network. Example processing associated with the data integration interface manager 301 was presented above with respect to the
The data integration interface manager 302 is configured to create and to record a profile for mappings that when processed transforms the source attributes to the target attributes.
According to an embodiment, the data integration interface manager 302 is also configured to associate merge rules, duplication rules, and selection rules for records when processing the profile.
Continuing with the embodiment above, the data integration interface manager 302 is configured to receive custom rules for some of the merge rules, some of the duplication rules, and/or for some of the selection rules.
The above description is illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of embodiments should therefore be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The present application claims priority to, and is a non-provisional application of Provisional Application No. 61/788,712 entitled: “Techniques for Scalable Database Integration and Processing in a Database Environment,” filed on Mar. 15, 2013; the disclosure of which is hereby incorporated by reference in its entirety herein and below.
Number | Name | Date | Kind |
---|---|---|---|
4138719 | Swanstrom | Feb 1979 | A |
7289964 | Bowman-Amuah | Oct 2007 | B1 |
7484096 | Garg | Jan 2009 | B1 |
7519605 | Vailaya | Apr 2009 | B2 |
8719267 | Chen | May 2014 | B2 |
8972337 | Gupta | Mar 2015 | B1 |
20030233321 | Scolini | Dec 2003 | A1 |
20040083199 | Govindugari | Apr 2004 | A1 |
20050055369 | Gorelik | Mar 2005 | A1 |
20050223109 | Mamou | Oct 2005 | A1 |
20050228808 | Mamou | Oct 2005 | A1 |
20050234969 | Mamou | Oct 2005 | A1 |
20060055956 | Takahashi | Mar 2006 | A1 |
20070088715 | Slackman | Apr 2007 | A1 |
20080027899 | Khunteta | Jan 2008 | A1 |
20080046474 | Sismanis | Feb 2008 | A1 |
20080243772 | Fuxman | Oct 2008 | A1 |
20080320012 | Loving | Dec 2008 | A1 |
20090012983 | Senneville | Jan 2009 | A1 |
20090077114 | Zachariah | Mar 2009 | A1 |
20090125796 | Day | May 2009 | A1 |
20090193046 | Desai | Jul 2009 | A1 |
20090319471 | Gooder | Dec 2009 | A1 |
20090319494 | Gooder | Dec 2009 | A1 |
20100057673 | Savov | Mar 2010 | A1 |
20110093453 | Frayman | Apr 2011 | A1 |
20110161352 | de Castro Alves | Jun 2011 | A1 |
20120060216 | Chaudhri | Mar 2012 | A1 |
20120158655 | Dove | Jun 2012 | A1 |
20120158678 | McGraw | Jun 2012 | A1 |
20120221509 | Gao et al. | Aug 2012 | A1 |
20130103705 | Thomas | Apr 2013 | A1 |
20130124523 | Rogers | May 2013 | A1 |
20130254238 | Yan | Sep 2013 | A1 |
20140222752 | Isman | Aug 2014 | A1 |
Number | Date | Country | |
---|---|---|---|
20140280218 A1 | Sep 2014 | US |
Number | Date | Country | |
---|---|---|---|
61788712 | Mar 2013 | US |