One of the challenges with allowing users to arbitrarily request data elements from a database is the ability to present different elements in a related fashion. For example, consider Microsoft Amalga™ UIS, a unified intelligence system/service that provides client users with access to clinical and other patient-related data. If a client user wants to view data about a medical procedure that was done on a patient in conjunction with that patient's financial data, e.g., whether the insurance company paid and when, the only way to do this is through a system administrator or the like creating such a custom view. This is because in general, such disparate information is maintained in different tables, each table having its own corresponding baseview, and a user can only create a userview based on a single baseview.
What is desirable is for users to be able to use all available stored data of such a system/service to create views and/or identify answers to questions, without needing the assistance of system administrative support, including manual administrator efforts to create and manage custom views for individual users. It is also desirable for users to be able to gather and collect relevant data in a generally automated fashion and share such data.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which data from different (e.g., disparate) data sources may be joined via metadata to create a single view (userview) of the data from the sources. In one aspect, an administrator or the like creates baseviews corresponding to database tables and associates metadata with the baseviews, including primary key metadata for the baseviews and meta-tags for one or more of the columns of each baseview. A user selects fields (corresponding to table columns) from a starting (recipient) baseview, along with fields from any other (donor) baseview that has metadata that matches the recipient baseview's metadata.
In one implementation, a join mechanism comprising join logic evaluates the metadata to determine if a join may be automatically created to provide the desired userview. If not, the join mechanism may provide the user with suggestions of fields that may be joined to facilitate manual selection of a field. The join mechanism may allow the user to override automatic joining and/or manually select another field to join.
In one aspect, a user interface is provided to assist the user in selecting fields and adding them to the userview. Once created, the userview may be saved, e.g., in a library, for re-use and/or sharing.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards inferring rules on relating disparate data sources, and expressing those rules as database joins. In one implementation, joins for disparate tables are inferred using a sequence of priorities with respect to metadata associated with each of the data sources (e.g., database tables). In one implementation, a join mechanism looks for joins in disparate data sources based upon any configured matching columns that are unique keys in a system, such as different identifiers in Microsoft Amalga™.
Further, for each data source, an administrator and/or automated process may specify an arbitrary “tag” for join fields, which provides a way to resolve the difficult task of joining disparate tables for access by a broad category of users. Thus, if the first look (for known matching columns) does not provide an appropriate join, the join mechanism secondarily looks for a matching set of tags between two or more tables to provide the desired join. In the event that the first or second looks fail to provide a meaningful join, the join mechanism allows the user to specify explicit join criteria.
While Microsoft Amalga™ UIS is used as an example herein of a set of data sources in which the join rules may be used, it should be understood that other data sources may benefit from the technology described herein, and that any of the examples described herein are non-limiting examples. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing and data processing in general.
To set up such rules given a script engine services component 110 or the like that provides a number of tables 112, an administrator or the like creates a baseview (block 114) for each table. As described herein, the administrator and/or automated process also associates each baseview with metadata (block 116). For example, the administrator may decide that a patient ID is a suitable unique identifier for each row in a patient table, and thus uses that information as metadata for matching this table with a similar key of another table, as described below.
More particularly, a baseview key comprises a primary key (single or a composite) that defines the uniqueness of a baseview. For example, a baseview key may correspond to one row per patient, one row per visit, one row per lab test (combination of accession ID plus account), or one row per pathologist. Any field or combination of fields that uniquely identifies a row in the baseview may represent the “row atom” for that baseview. Note that there may be more than one field capable of identifying the key in a baseview. By way of example, in a view that has one row per pathologist, a PathologistID field or a PathologistUPIN field, or possibly a PathologistCellPhoneNumber field may be chosen to uniquely identify the row atom.
In addition to a primary key, the administrator and/or an automated process may include tag columns in the table with a nomenclature meta-tag. Note that via the automated process, tags can also be populated by automated methods such as semantic vocabulary scanning, published meta-thesauruses which associate known ontologies to each other, or other automated methods; that is, particular logic may be created such that based on reading the available data or metadata of a particular table or column, additional meta-tags may be generated and annotated to a particular field. A similar meta-tag may be used with other tables that the administrator or the like decides is likely useful to be able to join with the table. For example, in a “procedure” table, the patient ID column may be tagged with a “patient” meta-tag, while a payerID column of a “financial” table may be tagged with a matching “patient” meta-tag. As described below, because both tables have a column tagged with this “patient” meta-tag, a user can join columns from the “procedure” table to columns from the “financial” table to create a view that shows the desired procedure and payment status.
Thus, a nomenclature tag is a global value (within a namespace such as “Amalga” or “Research”) value that provides a matching mechanism. This is a configurable setting that can be updated.
When the baseviews are set up, including with associated metadata, the administrator and/or a process may organize them as desired (block 120), e.g., into groups. The administrator and/or process may also set up relationships between baseviews, e.g., suggestions/hints that an end user can use, as described below. For example, automated processes may reference known or proprietary ontologies, natural language processing engines, or other methods to programmatically create relationships between baseviews.
For example, an administrator may interact to create a new metadata attribute, which can be of type text or multiple choice, with the multiple choice option used to add or edit values for the metadata attribute. The multiple choice metadata attributes present their values as multiple choices nomenclature configuration, showing default metadata attributes configured at the column level. The administrator is able to select one or more default attributes to add to the nomenclature lists, and also may sort the selected nomenclature attributes, which enables the join rules to prioritize the metadata matching
In general, as represented in
As generally represented by step 204 of
The user can select a field by suitable user interaction (e.g., by moving a highlighted selection box 551), and then click on the “Add Field >” button 552 to add a field to the selected fields region 554 on the right. Note that in one implementation, a user may see a field's related details (module, text description and so forth), e.g., in a detail region 553.
As can be readily appreciated, other ways to find and/or select fields, including by searching (block 556), are feasible. For example, a user may enter a field name in the search field or select an additional flier (metadata attribute) to search the fields in baseviews.
When the user has selected fields from baseview1, the user may also add one or more fields to the selected fields region from another baseview. For example, in
As represented by step 304 of
If there is a primary key, step 310 compares the primary key of the donor baseview with the recipient baseview key. If at step 312 the baseview key's meta-tags are the same, then the system automatically joins the baseviews on the baseview key (step 318). At this time, the selected fields region 554 of
If a donor primary key exists but no match is found with the recipient baseview key at step 312, then step 314 compares the donor baseview key's meta-tag with meta-tags of the remaining columns in the recipient baseview, e.g., traversing one at a time to look for a match. If there is a match, the join mechanism branches to step 318 to automatically join the baseviews on the donor baseview key and recipient, and update the selected fields region 554 of
The user may select more columns from the baseview2. The user may then select a save option (button 558), and if so, the system creates SQL joins based on the matching columns. The save may be to a library, to facilitate reuse and/or sharing by other users.
In another scenario, the user adds additional fields from another baseview, “baseview 3” to an existing advance view. To this end, the user may continue to work on the baseview (fields of baseview1 and baseview2) created in the above base flow scenario above. The user may also select a set of columns from baseview3 to add. If so, the system identifies the baseview key met-tag of the donor baseview and then compares it with the baseview key meta-tags of the baseview1 and baseview2. If a match is not found then the system will traverse all the columns (meta-tags) in baseview1 and baseview2 to find a match.
Turning to another scenario, consider that the system is unable to make the automatic join at step 318, that is, no match is found on the baseview key (step 312) or the nomenclature meta-tag (step 316). If so, the user may be provided with a prompt, possibly including hints. This is represented by step 320, which connects to step 206 of
In another scenario, a user may edit the relationships among the views on which the userview was based. For example, the user may choose to override the baseview matching columns by selecting an override option on an existing advance baseview. The system prompts the user with recommended columns based on the meta-tags, and provides an option to choose any matching column. The user may select a recommended column or any other column to make a join.
A user may also preview the joins that the user creates, e.g., by selecting a preview option. In one implementation, the user is shown some number (e.g., the first twenty-five) results a in a grid like interface. The number of results shown may be user-configurable, and the columns shown may be sortable. Filtering of the results is also provided.
In this manner, administrators and/or an automated process may apply metadata to baseviews/columns and organize baseviews into groups. Users may browse or search for fields, and add them to a custom userview, which is created automatically when possible, based on content meta-tags on baseview columns. If automatic joins are not possible, the system may provide the end-users with recommendations (suggest “join-able” columns), e.g., based on metadata, column names and/or type relationships. Users may accept the recommended matching columns or override the matching column by selecting other columns to form a join.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer 610 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 610 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 610. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above may also be included within the scope of computer-readable media.
The system memory 630 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 631 and random access memory (RAM) 632. A basic input/output system 633 (BIOS), containing the basic routines that help to transfer information between elements within computer 610, such as during start-up, is typically stored in ROM 631. RAM 632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. By way of example, and not limitation,
The computer 610 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, described above and illustrated in
The computer 610 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 680. The remote computer 680 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 610, although only a memory storage device 681 has been illustrated in
When used in a LAN networking environment, the computer 610 is connected to the LAN 671 through a network interface or adapter 670. When used in a WAN networking environment, the computer 610 typically includes a modem 672 or other means for establishing communications over the WAN 673, such as the Internet. The modem 672, which may be internal or external, may be connected to the system bus 621 via the user input interface 660 or other appropriate mechanism. A wireless networking component such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
An auxiliary subsystem 699 (e.g., for auxiliary display of content) may be connected via the user interface 660 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 699 may be connected to the modem 672 and/or network interface 670 to allow communication between these systems while the main processing unit 620 is in a low power state.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.