This invention relates generally to the processing of digital data. More particularly, this invention relates to techniques for creating a customized virtual data source.
Creating a customized virtual data source, especially from a text file or Extensible Markup Language (XML) file, is problematic. Prior art techniques require the user to select an entire data element from the data source when only a subset of the data properties of a data element are needed. Relationships between data elements are not provided, requiring the user to be familiar with these relationships in order to select the correct set of data elements. Furthermore, when the data source is a text file or XML file, the user must be familiar with the structure of the file to specify how the file should be parsed.
In view of the foregoing, it would be highly desirable to develop an application that automatically extracts the data elements of a data source, provides the data element relationships and gives the user more flexibility in customizing the virtual data source.
The invention includes a computer readable storage medium with executable instructions to translate a data source into a set of data elements, where a data element in the set of data elements includes a set of data properties. The set of data elements is displayed using a visualization. A group of data elements selected from the set of data elements is received. A group of data properties selected from the set of data properties associated with each data element in the group of data elements is received. A table schema for data elements in the group of data elements is provided. The group of data elements is converted into a target data source.
The invention also includes a computer enabled method for specifying aspects of a target virtual data source. The method includes receiving a data source with a set of data elements and receiving a group of data elements from the set of data elements via a visualization, where a data element in the set of data elements comprises a set of data properties. A subset of data properties from the set of data properties is received for each data element in the group of data elements via the visualization. A table schema is provided for data elements in the group of data elements. The group of data elements is converted into a target data source.
The invention also includes a computer readable storage medium with executable instructions to receive an XML file, display the XML file using a visualization, where the XML file expresses a set of concepts, and receive a group of concepts selected from the set of concepts, where a concept in the set of concepts includes a set of attributes. A group of attributes selected from the set of attributes is received for each data element in the group of data elements. A table schema is specified for each concept in the group of concepts. The group of concepts is converted into a virtual relational database.
The invention is more fully appreciated in connection with the following detailed description taken in conjunction with the accompanying drawings, in which:
Like reference numerals refer to corresponding parts throughout the several views of the drawings.
The following terminology is used while disclosing embodiments of the invention:
A cascading drop down list is a series of dependent drop down lists. The content of the first drop down list in the series is determined independently. The content of each succeeding drop down list in the series is dependent on the selection made in the drop down list immediately preceding it.
A data element is an object in a data source (e.g., a concept in an XML file, an entity in a relational database). A data element comprises a set of one or more data properties.
A data property is a characteristic or measure associated with a data element (e.g., an attribute in an XML file or a relational database).
An entity-relationship diagram is a visualization that illustrates correlations between objects. In particular, an entity-relationship diagram illustrates which objects comprise other objects. An entity-relationship diagram can be used to illustrate relationships between data structures and data elements, and between data elements and data properties, and the like.
A tree is a visualization that illustrates a hierarchical relationship amongst a set of objects
A virtual data source or target data source is a system that facilitates direct access to data from one or more discrete data sources. A virtual data source comprises one or more data elements. A virtual data source can be consulted in a similar way to a conventional data source of the same type.
A memory 110 is also connected to the bus 106. In an embodiment, the memory 110 stores one or more of the following modules: an operating system module 112, a data processing module 114 and a graphical user interface (GUI) module 116.
The operating system module 112 may include instructions for handling various system services, such as file services or for performing hardware dependant tasks. The data processing module 114 includes executable instructions to receive a data source, to analyze the data source and extract data elements and data properties from it, to define or accept specifications for a target data source and to create the target data source. The GUI module 116 may rely upon standard techniques to produce graphical components of a user interface, e.g., windows, icons, buttons and menus.
The executable modules stored in memory 110 are exemplary. It should be appreciated that the functions of the modules may be combined. In addition, the functions of the modules need not be performed on a single machine. Instead, the functions may be distributed across a network, if desired. Indeed, the invention is commonly implemented in a client-server environment with various components being implemented at the client-side and/or the server-side. It is the functions of the invention that are significant, not where they are performed or the specific manner in which they are performed.
In the first processing operation 202, the data processing module 114 receives a predetermined, user specified or default data source. The data source is then analyzed and parsed into the identified data elements and data properties 203. In an embodiment, the data source is an XML file the data elements are XML concepts and the data properties are XML attributes. In other embodiments, the data source is a relational database, an On-line Analytical Processing (OLAP) cube, a text file, a data warehouse and the like. The data elements and data properties are then displayed using a visualization (e.g., a tree hierarchy, list, entity-relationship diagram) 204. The data processing module 714 receives a data element 206 selected by the user or specified by a default value, then receives a group of selected data properties 208 that are associated with the data element. In an embodiment, the target data source is a virtual database. In this case, the user optionally provides a table schema, or a default table schema is provided, for the data element 210; the selected data properties equate to the table columns. In other embodiments, the target data source is a virtual data warehouse, a virtual XML file, a virtual OLAP cube and the like. In the next processing operation, the data processing module 114 waits for an action from the user 212. If the user selects another data element (212—Select Data Element), the data processing module 114 returns to the processing operation 206 to receive the selected data element. If the user chooses to create the target data source (212—Create Data Source), the data processing module 114 constructs a target data source 214 based on the supplied data elements and data properties.
The list view tab 302 displays both the data elements 312 and the data properties 314 as lists. In an embodiment, when the user highlights a data element (e.g., Book 309) the data property list 314 is populated with the applicable data properties for the highlighted data element (e.g. ISBN, Price and Title 318). In an embodiment, all the data properties for a selected data element are selected by default. In an embodiment, a sorting order selection button (e.,g the arrow 307) is used to select a sorting order (e.g., alphabetical order, selection order, data type order) by which to sort the properties. In an embodiment the user can search the data elements and data properties for a word or phrase entered in the search box 307. The user can indicate whether to search both the data elements and data properties just the data elements or just the data properties using the “Look for” drop down list 306. In an embodiment, the search function finds the first instance of the word or phrase. and the user is able to step through subsequent instances using a “Find Next” link or button. In an embodiment, the search is performed on names of the data elements and data properties. In an embodiment, the search is performed on the data stored in the data source. In an embodiment, the “Display” drop down list 303 is used to alternate between viewing all the data elements and viewing search results. Once the user has specified the aspects of the target data source, it can be created by clicking the “Create Tables” button 316.
In an embodiment, validation checks are performed before creating the target data source. The validation checks include, but are not limited to: checking for unique keys, checking that linked data elements contain valid data, checking that the data of a data property complies with the specified data type and length, checking that the data contains a specified number of distinct values and checking that a data element contains at least one data property.
In an embodiment, the user can specify that the selected data elements and their immediate parents and children be displayed. The head node 800 of the entire data element tree is displayed whether it has been selected or is the immediate parent of a selected data element.
In an embodiment, the “Explorer View” automatically truncates the tree when there are too many data elements to display in the provided area. In an embodiment, truncation is indicated by a linked data element. When the linked data element is clicked, the “Explorer View” is updated to display the linked data element and one or more child branches. If the new tree is truncated, the user can view a truncated portion of the new tree in the same way. In an embodiment, there are options to return to the previously displayed tree or to the original tree.
In an embodiment, when a user views an existing target data source they are notified if the data source has been altered. Furthermore, they will receive detailed notification regarding which data elements have been removed from the data source and which data elements have been added to the data source.
In an embodiment, the user can edit any aspects of an existing target data source (e.g., selected data elements, selected data properties, table schemas). In an embodiment, when editing a target data source, the user can opt to view only the concepts already existing in the target data source. In an embodiment, when editing an existing target data source, changes to data element selection and data property selection are tracked by a visual indicator such as highlighting, font formatting or the like.
An embodiment of the present invention relates to a computer storage product with a computer-readable medium having computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute program code, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment of the invention may be implemented using Java, C++, or other object-oriented programming language and development tools. Another embodiment of the invention may be implemented in hardwired circuitry in place of, or in combination with, machine-executable software instructions.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art that specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously, many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, they thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the following claims and their equivalents define the scope of the invention.