As applications provided over the Internet have developed, so has the need for efficient data processing. Many data stores use extract, transform and load (ETL) functionality with servers such as SQL servers to process and manage data. ETL functionality may be implemented by sets of code called “packages” which retrieve source data from a source, process the data, and store the data to as destination data at a chosen destination.
When the format of source data or destination data changes, the packages must be changed as well. For example, source data may have a column that is split into multiple columns, a column name may change, or the destination data may change, for example, a column or server may be renamed or moved to a different location. When source data or destination data formats change, the packages must be changed as well. Typically, packages are changed manually. This requires a large amount of user bandwidth and subjects the resulting package changes to a higher possibility of error.
The present technology, roughly described, creates and manages packages that provide extraction, transformation and loading (ETL) functionality. The packages are ultimately generated from templates that include configurable and static parameters. The templates are created from user input or stored data, and package instances having values for the template properties are created from the templates. The instances are then transformed into a package used to implement ETL functionality.
In some embodiments, a package template may include one or more components for retrieving data from a source, transformation of the retrieved data before it is stored, and storage of the data at a destination. The template may specify one or more static or configurable properties for each package component. Configurable properties are configured with values in each instance created from a template.
Once an instance is created, the instance may be translated into a package. In some embodiments, generating a package from an instance includes generating a parameter file from the instance using the parameter file to translate the template into a package. In addition to generating new packages, the present system may be used to modify existing packages.
One embodiment ultimately generates a package from a template. A template is generated for package which is to be used with a destination, such as but not limited to an SQL server. The template has a source component, data transformation component and a destination component. At least one of the components includes one or more configurable properties which are designated as configurable by a user. Next, an instance is generated from the template and includes values for the one or more configurable properties. A package is then generated from the instance. The package is configured to retrieve data from one or more sources identified by the source component, store the data in one or more destinations identified by the destination component, and includes a mapping between the one or more sources and the one or more destinations.
One embodiment creates a package and is able to modify the package. A first package is created from a template instance. The template instance contains a source component configured to retrieve source data, a data transformation component configured to process the retrieved data, and a destination component configured to store destination data at an SQL server. The first package also includes at least one source component property, destination component property, and a mapping associated with the source data and destination data. An instance associated with the first package is accessed and one or more component properties of the accessed instance are changed. A modified package is then generated based on the changed instance.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The present system creates and manages packages that provide ETL functionality. The packages are ultimately generated from templates. The templates may include configurable and static parameters and are created from user input or stored data. Package instances having values for the template properties are then created from the templates. The instances are then transformed into a package used to implement ETL functionality.
In some embodiments, a template (and eventually the resulting package) includes a source component, data transformation component and destination component. A source component handles retrieving data from a source. A data transformation component handles any transformation of the retrieved data before it is stored. A destination component handles storage of retrieved and optionally processed data at a destination.
The template may specify one or more properties for each package component. The properties may be static or configurable, as indicated by a user creating or modifying the package through the present system. An instance is created from the template and may contain different values for the template properties. Thus, configurable properties are configured with values in each instance created from a template. Different instances may contain different values for a configurable property. Some components may also include mappings. For example, a destination component may include a mapping which indicates which source column data is to be placed in a particular destination column. Thus, if the name or other information between source column data and destination column data is different, configurable mapping components are used to decide the mapping.
Once an instance is created, the instance may be translated into a package. In some embodiments, generating a package from an instance includes generating a parameter file from the instance. The parameter file indicates the differences between the instance and the template and is used to translate the template into a package. The generated template includes identification information for the instance and template from which it was ultimately generated from.
In addition to generating new packages, the present system may be used to modify existing packages. Packages can be modified simply by importing an existing package into the system and translating the package into an instance. Accordingly, when a package is imported, it is associated with a template. The instance can then be modified by the user, such as the values for one or more configurable component properties. A modified package can then be generated from the modified instance.
Client 110 may be implemented as a machine, mobile device or some other device which may communicate with Network Server 130 over Network 120. In some embodiments, a user of client 110 may run an application on client 110 to access Network Server 130 or Application Server 140. Network 120 may be implemented as any public network, private network, a WAN, LAN, intranet, extranet, or other network or networks. In some embodiments, Network 120 may be implemented as the Internet.
Network Server 130 provides a service over Network 120 and may communicate with Client 110 and Application Server 140. In some embodiments, Network Server 130 may provide content as part of a network service to Client 110 in response to requests received from Client 110 over Network 120. When Network 120 is implemented as the Internet, Network Server 130 may be implemented as a Web server and provide a Web service over Network 120.
Application Server 140 includes Package Template Application 142 and is in communication with Network Server 130 and Data Store 150. Application Service 140 may process requests received from Network Server 130 and make requests to Data Store 150. Package Template Application 142 may create, modify and manage templates, instances, and packages. Package Template Application 142 may also be stored on Data Store 150 or some other location. In some embodiments, a user of Package Template Application 142 may ultimately create and manage packages from a template. Generally speaking, references to a user herein are intended to reference a user of package template application 142. In some embodiments, package template application 142 is implemented as “SQL Server Integration Service” (SSIS) by Microsoft Corporation, of Redmond, Wash. Package Template Application 142 is discussed in more detail below with respect to
User Interface 144 is provided by Package Template Application 142 and may be used as a tool to implement the functionality of Application 142, in particular to create and manage package templates, instances and packages. Examples of User Interface 144 are discussed in more detail below with respect to
Data Store 150 includes one or more packages 150 and is in communication with Application Server 140 and Data Bases 160-170. Data Store 150 may include data, an interface, a business logic layer, and other modules. In some embodiments, Data Store 150 may be implemented as an SQL server.
Data Bases 160-170 may be used to store data which may be retrieved or provided by Package 152. For example, Package 152 may retrieve data from Data Base 160 and store data in Data Base 170. Thus, Data Bases 160-170 may serve as data sources or data destinations for data handled by Package 152. Other sources or destinations of data processed by Package 152 may include a file, documents, or some other data at some other location.
Template Manager 230 may be used to create and delete templates from Data Repository 240 or any other location in which templates are stored. The template manager provides template components and component properties to requesting entities. The template data created or deleted may be part of Template Data 252. Instance Manager 232 may be used to create, update and delete template instance data such as Instance Data 254. Instance Manager 232 may create instances from templates described within Template Data 252.
Package Manager 234 may update, delete, import or export package data such as Package Data 256 from Repository 240. Package Generation 236 may generate packages by applying changes indicated in parameter files to a template. Thus, Package Generation 236 generates Package Data 256 based on Parameter Files 258 and Template Data 252. User Interface Engine 220 provides Interface 144 to leverage the functionality provided by Package Template Application 142. The interface engine is used to aid in the managing of templates, instances and packages by the application.
Template Data 252 is data which describes one or more templates. The data includes a skeleton framework for the general format and interface of a package. In one embodiment, a template may be generated for a particular format of package desired, have components which may or may not be marked as configurable, and have one or more properties within each component. Examples of package formats include packages for transferring employee records from one database to another, packages for retrieving sales data and translating the data into a different currency, and other general data processing functions.
Instance Data 254 is created from Template Data 252. The instance data has values for configurable properties specified in the template from which the instance was created. Instance data can be entered by a user through Interface 144 provided by Interface Engine 210 or retrieved from a file or some other source.
Parameter Files 258 are generated from Instance Data 254 and include changes contained in an instance with respect to a particular template. Package Data 256 is created from Parameter Files 258 and Template Data 252 and specifies information for one or more packages created by Application 142.
Data Transformation Component 320 indicates any transformations that are to be performed on data retrieved by Source Component 310. Properties 322 of the data transformation component may indicate what transformations are to be done to the retrieved data, if any. For example, data transformation component properties 322 may specify a format to display a date, an operation to be performed on a set of data (such as sum), or some other data processing operation.
Destination Component 330 handles storing the retrieved and optionally-transformed data at a particular destination. Destination Component 330 includes Properties 332 and Mappings 334. Properties 332 may indicate the name of the server, table, column name or some other location at which the data should be stored. Mappings 334 may indicate a mapping of a source column to a destination column, or some other mapping to relate the source and destination of data.
The template scalar property table includes columns of scalar property I.D., component I.D., property name and property type. The scalar property I.D. is a unique identifier for a property of a package, the component I.D. is an identifier for a particular component, and property name may be a name of a particular property. In some embodiments, a component identifier may be made up of a package name, executable name, component name, and optionally other information. The property type can assume a value of unknown, integer, stream, date, time or bullion.
The template component table of the package template schema may contain columns of component I.D., component class I.D., component name, template I.D., component URI, component type, row set descriptor type, Contains Mappings and Contains Collections. A component URI is a path to locate the component within the template body. A component type is the type of component and can have values of package, task host, data flow task, sequence for loop, for each loop, DTS vent handler, component, variable, connection manager, precedence constraint or some other value. The row set descriptor-type column may have different types of data values. The “contains mapping” column may define whether the component contains custom mappings and the “contains collections” column defines whether the component contains configurable column collections.
The instance column collection properties table has columns of collection I.D., property name and property value. The instance columns table has columns of column I.D., collection I.D. and collection name. The instance column properties table has columns of column I.D., property name and property value.
A template instance is created from the template based on user input received through User Interface 144 at Step 730. The template instance is generated based on user input received to create the instance, including values provided for configurable components and properties of the template created at Step 720. More detail for creating a template instance from a template is discussed below with respect to
A package is created from an instance based on user input received through User Interface 144 at Step 740. Creating a package from an instance may begin with creating a parameter file from the instance. The package itself may then be created from the parameter file and the template created at Step 720. Creating a package is discussed in more detail below with respect to the method of
After the package is created, the package is stored on an SQL server at Step 750. In some embodiments, the package may be stored on Data Store 150. The stored package may then be executed at the SQL server at Step 760. Once executed, the package may perform extraction, transformation and/or loading functionality for the SQL server, for example, for a SQL server integration service application.
At some point, a determination is made as to whether changes to destination data or source data for the package are detected at Step 770. The changes may be detected based on a notification to a user of the system, a manual detection by the user, or in some other manner. For example, changes to destination data may include addition of a new column, removal of a column, or some other change. If no change is detected, package execution continues at Step 760. If a change is detected in source data or destination data, the package is updated based on the detected changes to the destination data at Step 780. Updating the package may include importing the package to generate an instance for the package, modifying the instance, and updating the package with the modifications. Updating a package is discussed in more detail below with respect to
Data transformation components and related component properties are added to the template file at Step 830. A data transformation component may handle transformation of data retrieved from a source before the data is stored at a destination. Adding a data transformation component and related component properties is discussed in more detail below with respect to
Destination components and related component properties and mappings are then added to the template file at Step 840. A destination component handles where retrieved data is to be stored by the package. The component properties of a destination component may indicate the details of where to store the retrieved data. The mappings of the destination component may indicate where the retrieved data was retrieved from. Adding a destination component and related component properties and mappings is discussed in more detail below with respect to
It should be apparent that steps 820-850 may be performed in any order, and that the order of the steps illustrated in the method of
A determination is made as to whether another source component property should be added to the source component at Step 912. In some embodiments, additional component properties may be added to the source component if data is to be retrieved from more than one source, additional data is required from the particular source, or for some other reason. If another property should be added to the source component, the method of
A determination is then made as to whether the data transformation component property should be made configurable at Step 926. If the property should not be made configurable, the method of
A determination is then made as to whether another property should be added to the data transformation component at Step 932. Additional properties may be added to the data transformation component if there is more than one transformation or operation to perform on a particular data, two or more types of data need to be transformed, or for some other reason. If another property should be added to the data transformation component, the method of
A determination is then made as to whether the destination component property should be made configurable at Step 946. If the destination component property is not made configurable, the method at
A determination is made as to whether another property should be added to the destination component at Step 950. Additional destination component properties may be added if more than one type of data is stored, a particular set of data is to be stored in more than one location, and/or for other reasons. If another destination component property is to be added to the destination component, the method at
A selection is received to add a mapping to the destination component at Step 952. In some embodiments, each destination component will contain at least one mapping. A mapping indicates what source data is linked to a particular destination data. Thus, adding a mapping may involve adding a parameter for a source and a parameter for a destination. For example, the mapping may link columns names from a data source to a column of a data destination. The mapping is added to the destination component at Step 954. A determination is then made as to whether the destination component mapping should be made configurable at Step 956. If the mapping for the destination component should not be made configurable, the method of
A Determination is made as to whether another mapping should be configured for the component at Step 960. If no further mapping should be added to the destination component, then the method of
Configurable components are provided through User Interface 144 at Step 1025. The configurable components may be provided in a list of components within an interface such as that of
Data is received to configure the property and/or mapping at Step 1040. The data may indicate a value for a property, such as a particular column from which to retrieve data or store data, or data for a mapping or some other input. Next, a determination is made as to whether additional properties or mappings exist to be configured for the particular component at Step 1045. If additional properties exist to be configured for the selected component, the method of
After receiving a selection of one or more instances, parameter files are generated for each selected instance at Step 1150. The parameter file may include components, properties, mappings, template I.D. from which the instance was created, instance I.D., and any other data particular to the instance which differs from the template from which the instance was created. In some embodiments, the parameter file is in .xml format. After the parameter file is generated, the template from which the instance was created is translated into a package based on the parameter file at Step 1160. Thus, changes to the template indicated in the parameter file are made to a copy of the template to generate a package from the template. The generated package is then stored as Package Data 256 at Step 1170.
As discussed above, a change may be detected in the source data, destination data or other data associated with a package at some point during execution of the package, or after the package is created. If such a change is detected, the package may be modified in order to reflect these changes. The package may be modified by importing the package, making modifications to an instance associated with the package, and creating a modified package from the modified instance.
A selection for a package to be imported is received at step 1210. The selected package is one associated with data in which a change was detected at step 770. The package may be selected through a file menu or listing provided by package template application 142. The selected package is then imported at step 1220. Importing a package may include loading the components of the package, the component properties and mappings, and property and mapping values into application 142.
An instance is retrieved or created from the imported package at step 1230. In some embodiments, the template includes an instance ID from which the package was generated. The instance ID can be used to retrieve the corresponding instance from instance data 254. In some embodiments, the instance may be created from the imported package itself. For example, application 142 may have functionality to re-generate an instance from a loaded package 152, including configurable and static properties and mappings as well as all components of the package. The configurable properties may be indicated so within the package. In some embodiments, the package may include a template ID from which it was ultimately created, and the instance may be generated based on the differences between the package and the corresponding template.
The template data created or retrieved is provided through user interface 144 at step 1240. Modifications to the instance are then received through the interface from a user at step 1250. The template instances may be modified with respect to their configurable properties, such as column names and other data. Modifying the instance may include repeating one or more of steps 1025-1055 of the method of
A parameter file is then created for the modified template instance at Step 1270. Creating the parameter file based on the modified template instance may be similar to creation of the parameter file discussed above with respect to Step 1150 of the method of
Computing environment 1700 of
The technology described herein is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the technology herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, mobile phones or devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The technology herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. The technology herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 1710 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 1710 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 1710. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The system memory 1730 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 1731 and random access memory (RAM) 1732. A basic input/output system 1733 (BIOS), containing the basic routines that help to transfer information between elements within computer 1710, such as during start-up, is typically stored in ROM 1731. RAM 1732 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1720. By way of example, and not limitation,
The computer 1710 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 1710 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 1780. The remote computer 1780 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 1710, although only a memory storage device 1781 has been illustrated in
When used in a LAN networking environment, the computer 1710 is connected to the LAN 1771 through a network interface or adapter 1770. When used in a WAN networking environment, the computer 1710 typically includes a modem 1772 or other means for establishing communications over the WAN 1773, such as the Internet. The modem 1772, which may be internal or external, may be connected to the system bus 1721 via the user input interface 1760, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 1710, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.