1. Field of the Invention
Provided are a method, system, and program for determining compliance of a database architecture to an enterprise data standard.
2. Description of the Related Art
An enterprise may maintain its databases in a specific format using an enterprise wide data standard for database elements, such as tables, fields, indexes, keys, views, functions, definitions, aliases, nicknames, etc. At certain points, the enterprise may acquire or may have previously developed legacy databases that define database elements, such as tables, fields (columns), indexes, etc., using a data standard different from the enterprise data standard. For instance, a company that maintains a customer and product database may acquire another related company that also maintains a customer and product database, but uses a different data standard for its database elements than the acquiring enterprise.
In the current art, the company wanting to integrate a source database into its enterprise database may have to hire a database engineer to manually alter the architecture of the source/legacy database to conform to the data standard and naming convention of the enterprise database. The process to conform another database to the enterprise naming convention may be quite extensive and time consuming, and require analysis of the source and target database architectures, developing translating terminology, forming dictionaries, matching the names from the source database to the target enterprise database, applying transformations such as join and filters to restructure the information, etc.
There is a need in the art for improved techniques for determining the extent to which a legacy database complies with an enterprise data standard during the process of integrating the legacy database with an enterprise database.
Provided are a method, system, and program for determining compliance of a database architecture to an enterprise data standard. A physical model is generated defining database elements in a database. A logical model is provided representing a definition of elements and their relationships. The logical model is used to generate a mapping of database element names in the physical model to corresponding elements in the logical model. The mapping and the logical model are processed to determine an extent to which the database elements and relationships in the physical model violate rules of the logical model.
Each entity may define data elements or attributes, having a name, definition, data type, length, etc. The conversion program 8 receives user input from an input device 12 (e.g., mouse, keyboard, pen stylus, voice activated input, touch sensitive display monitor, etc.) and renders a user interface 14 on a display monitor 16 to render information about the compliance and mapping of the legacy physical model 20 to the enterprise data standard 10. The legacy 18 and enterprise 26 databases may be implemented in a storage device in communication with the computer 2. The computer 2 may interface with the databases 18 and 26 over a bus interface, network interface, wireless connection or other suitable data communication interfaces known in the art.
The conversion program 8 is capable of processing the legacy database 18 implementation to generate a legacy physical model 20 that defines the database elements implemented in the legacy database 18, such as table names, field names in the tables, primary and alternate indexes on the tables, relationships between fields (e.g., primary and foreign keys), views, functions, definitions, aliases, nicknames, etc.
The conversion program 8 may determine a mapping 30 from the database elements and relationships in the legacy physical model 20 to the database as defined in the logical models 12 of the enterprise data standard 10 to ascertain to what extent the legacy physical model 20 is compliant with and maps to logical models 12 of the enterprise data standard 10. A physical model 20 and 24 may be converted to statements in a database programming language, such as the Structured Query Language (SQL), to code the database elements defined in the physical model 24 in the enterprise database 26.
The conversion program 8 may further generate a report 28 providing information on the results of the mapping 30 of the database elements, relationships, invariants, etc., in the legacy physical model 20 to the logical model 12 defining the enterprise data standard. This report 28 may indicate database elements of the legacy physical model 20 that did not map to the requirements of the enterprise data standard 10.
If (at block 108) the database element name does not comply (match) with one element name abbreviation 92, then the conversion program 8 uses (at block 110) a combination of automated mapping techniques and user interfaces to manually create mappings 30 between the logical and physical data model elements. The automated mapping techniques may comprise additional comparison operations to determine whether the database elements from the physical model 20 match or correspond to one database element defined in the logical model 12. The copending and commonly assigned application entitled “TOLERANT AND EXTENSIBLE DISCOVERY OF RELATIONSHIPS IN DATA USING STRUCTURAL INFORMATION AND DATA ANALYSIS”, by Mauricio Antonio Hernandez, Ching-Tien H, Mary Ann Roth, and Lingling Yan, filed on Jun. 10, 2005, having attorney docket no. SVL920050003US1, and which application is incorporated herein by reference in its entirety, provides techniques for mapping database elements in a physical model to corresponding elements defined in logical model, such as an enterprise logical model 12. Further, if the automated techniques still fail to find correspondence, then the user may manually indicate a mapping 30 of one database element in the legacy physical model 20 to one element defined in the enterprise logical model 12. If (at block 108) the database element in the physical model 20 does comply, then the conversion program 8 automatically generates (at block 112) mappings 30 between the logical and physical data model elements based on name discovery and matching, using the naming standard defined in the naming dictionary 1.
After determining the mapping 30 at block 110 or 112, the conversion program 8 proceeds (at block 114) back to block 106 to determine a mapping for a further database element in the physical model 20. After determining the mappings 30 for all database elements in the physical model 20 to consider, the conversion program 8 traverses (at block 116) the mapping from the logical data elements to their physical data model counterparts and ensures the invariants of the logical model hold on the physical data model elements. The conversion program 8 may determine whether a database element in the physical model has relationships to other database elements in the physical model 20 that match the relationships the corresponding database element in the logical model 12 has with database elements in the logical model 12. For instance, if the physical model 20 has database element A having relationships with database elements C and D, the conversion program 8 determines whether the element A and its relationships C and D have correspondence to an element A′ in the logical model 12 and that A′ in the logical model 12 have a relationship with elements C′ and D′ that correspond/map to elements C and D in the physical model 20. The conversion program 8 may further check the extent to which the legacy physical model 20 elements and relationships comply with the rules and invariants defined in the enterprise logical model 12. The conversion program 8 further generates (at block 118) a report 28 for the physical data model elements that violate any invariants.
With the described embodiments, the conversion program 8 determines a mapping 30 and correspondence of the legacy physical model 20 database elements and relationships to the elements, relationships and rules defined in the enterprise logical model 12. The compliance of database elements in the physical model may be determined by using the naming dictionary 11. This information on the extent to which the physical model 12 does not comply with the enterprise data standard 10 may be presented to a database engineer to use when integrating the data from a legacy database 18 into the enterprise database 26.
At blocks 206 through 220 a loop is performed for each database element in the enterprise logical model 12. At block 108, the conversion program 8 uses the logical model 12 and related data structures (naming dictionary 11) to determine for each data element in the enterprise standard model 10 the maximum n best fitting database elements in the legacy physical model 20 for the database element in the enterprise logical model 12 being considered. Alternatively, the conversion program 8 may use the model and data structure (naming dictionary 11) to determine for each database element in the legacy physical model 20 the (user-configurable) maximum n best fits of database elements in the enterprise logical model 12. If (at block 210) there are zero or more than one best fits in the legacy physical model 20 for that database element in the enterprise logical model 12, then a user interface 14 is generated (at block 212) indicating that the determined element in the enterprise standard model 10 (table, view, field, logical relationship, invariant, etc.) has either 0 or more than 1 possible matches in the legacy physical model 20. The conversion program 8 then receives (at block 214) user selection of the best fit and/or modifications to include filters and transformation functions that map the determined database element in the legacy physical model 20 to an element or set of elements in the enterprise logical model 12. The conversion program 8 further indicates (at block 216) enterprise elements in the enterprise logical model 12 that do not map to one or more database elements in the legacy physical model 20. Mappings and indication of no mappings may be made in the mapping 30.
If (at block 210) there is one best fit for the database element in the enterprise logical model 12, then the conversion program 8 generates (at block 218) a mapping in the mapping 30 for determined database element name in the legacy physical model 20 to its corresponding elements in the enterprise logical model 12. From block 216 or 218, control proceeds (at block 220) back to block 206 for a next element in the enterprise logical model 12. The conversion program 8 then generates (at block 222) a physical model 24 containing table, alias, function, and view statements that rename and restructure the legacy model to be compliant with the enterprise standard model. In this way, the generated mappings in the mapping 30 are used to generate the enterprise physical model 24 from the legacy physical model 20.
The operations at blocks 254 through 258 provide an embodiment of operations to generate the mappings. The conversion program 8 renders (at block 254 and 256) in the user interface 300 a first list 302 of database element names, such as from the legacy physical model 20, and a second list 304 of database element names, such as from the enterprise physical model 26. Further, lines are rendered (at block 258) connecting the rendered database element names in the first list 302 to at least one rendered database element name in the second list 304, wherein the lines illustrate the mapping of the rendered database element names in the first list to the rendered database element names in the second list. For instance, in
The conversion program 8 may further enable the user to select one illustrated mapping, i.e., the lines connecting the database element names, to access information on the correspondence of one database element in the different physical models representing the different databases.
The described operations may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein refers to code or logic implemented in a medium, where such medium may comprise hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium, such as magnetic storage medium (e.g., hard disk drives, floppy disks,, tape, etc.), optical storage (CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The computer readable medium in which the code or logic is encoded may also comprise transmission signals propagating through space or a transmission media, such as an optical fiber, copper wire, etc. The transmission signal in which the code or logic is encoded may further comprise a wireless signal, satellite transmission, radio waves, infrared signals, Bluetooth, etc. The transmission signal in which the code or logic is encoded is capable of being transmitted by a transmitting station and received by a receiving station, where the code or logic encoded in the transmission signal may be decoded and stored in hardware or a computer readable medium at the receiving and transmitting stations or devices. Additionally, the “article of manufacture” may comprise a combination of hardware and software components in which the code is embodied, processed, and executed. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the present invention, and that the article of manufacture may comprise any information bearing medium known in the art.
In described embodiments, a legacy database 18 was checked for compliance with an enterprise data standard 10. In additional embodiments, the source database subject to the compliance checking may not comprise a legacy database, but any type of source database that needs to be checked. Further, the logical model used for the compliance checking may be part of any data standard, not just an enterprise data standard.
The terms “an embodiment”, “embodiment”, “embodiments”, “the embodiment”, “the embodiments”, “one or more embodiments”, “some embodiments”, and “one embodiment” mean “one or more (but not all) embodiments of the present invention(s)” unless expressly specified otherwise.
The terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless expressly specified otherwise.
The enumerated listing of items does not imply that any or all of the items are mutually exclusive, unless expressly specified otherwise.
The terms “a”, “an” and “the” mean “one or more”, unless expressly specified otherwise.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary a variety of optional components are described to illustrate the wide variety of possible embodiments of the present invention.
Further, although process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.
When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article or that a different number of devices may be used than the multiple number shown.
The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the present invention need not include the device itself.
The illustrated operations of
The foregoing description of various embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto. The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.