1. Field of the Invention
The present invention generally relates to data processing, and more particularly to analysis of query results.
2. Description of the Related Art
As the information accessible to IT solutions becomes more distributed and diversified, it will become increasingly necessary to access information from multiple data sources and integrate the information retrieved into a representation which meets the needs of the application and end users of the application. This information not only needs to be displayed for a user to view or update, but may require advanced analysis techniques to develop knowledge and insights from the data.
Currently, a variety of methods, techniques and products are available to perform these types of analyses. Unfortunately, these conventional solutions require that the data be in specific formats (e.g., specified by Comma Separated Variables, SQL selection from database table, text files, XML representations, etc.) prior to executing their respective analysis algorithms, and in most cases require some knowledge of what the specific fields in the data sources contain (e.g., maximum lengths of character data).
The problem is further complicated by the fact that the input data (i.e., the data input to the analysis algorithms) is typically a subset of the entire data available from the data sources. The user attempting to gain insight from the data frequently cannot predict which combinations of data will be fed into which analysis algorithms. As a result, custom programming is needed to link specific input data retrieved from the data sources into a format suitable for the analysis algorithms and custom programming in the analysis application to accept this specific data. Accordingly, for each query, the fields returned from that query need to be known in advance by the analysis routine. This means that for every query specified by a user, and for every analysis needed, a custom program configured to accept those inputs returned by those specific queries must be written. Therefore, a new query specified by a user and containing different or additional fields, requires a new analysis program. Further, if this same data were to be analyzed by multiple algorithms, each of these new algorithms would need to be developed. These custom analysis algorithms are tied to a specific set of input data and are not available for use with new queries containing different fields.
Therefore, there is a need for a mechanism for dynamically generating input to an analysis environment.
The present invention generally provides methods, apparatus and articles of manufacture directed to dynamically generating input to an analysis environment.
In one embodiment, a user selection is received of an analysis routine configured to perform an analysis on selected data in an analysis environment. The user-selected analysis routine has a predefined association with a code portion configured to provide the input to the analysis environment. In response to the user selected analysis routine, parameter values are displayed in one or more fields; wherein the one or more fields are predefined for the user selected analysis routine and wherein the parameter values are made available from the selected data. A user selection of one or more of the parameter values is then received. Based on the user selections, the code portion generates information necessary to perform the analysis on the selected data. Subsequently, the code portion outputs the input to the analysis environment; wherein the input includes at least the selected data and the information necessary to perform the analysis on the selected data.
Another embodiment provides a method of dynamically generating input for an analysis environment to perform data analysis on selected data, in which an analysis routine selection screen containing a plurality of analysis routines for user selection is displayed. Based on a user-selected analysis routine, a plurality of parameter values are then displayed. The input is generated using the selected data, the user-selected analysis routine and one or more user-selected parameter values, and then provided to the analysis environment.
Yet another embodiment provides a computer readable medium containing a program which, when executed, performs an operation for dynamically generating input for an analysis environment to perform data analysis. The operation includes outputting a plurality of analysis routine selections, each associated with a separate analysis routine configured to perform an analysis on selected data in the analysis environment, and wherein each analysis routine has a predefined association with a code portion configured to provide the input to the analysis environment; receiving a user selection of an analysis routine having a predefined relationship with a particular code portion; populating one or more fields with parameter values; wherein the one or more fields are predefined for the user selected analysis routine and wherein the parameter values are made available from the selected data; receiving a user selection of one or more of the parameter values; based on the user selections, generating, by the code portion, information necessary to perform the analysis on the selected data; and outputting, by the code portion, the input for the analysis environment; wherein the input includes at least the selected data and the information necessary to perform the analysis on the selected data.
Still another embodiment provides a computer system, comprising a framework configured to dynamically generate input for an analysis environment to perform data analysis on selected data. The framework comprises a plurality of code portions for providing the input to the analysis environment; analysis routines metadata specifying a plurality of user-selectable analysis routines to be displayed via a user interface and, for each of the plurality of user-selectable analysis routines, a code portion to run the analysis routine; and a separate portion of parameters metadata for each of the plurality of user-selectable analysis routines; wherein each separate portion of parameters metadata specifies parameter values to be displayed via the user interface.
So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
The present invention provides for methods, apparatus and articles of manufacture directed to dynamically generating input to an analysis environment. Varying input data is linked with analysis routines by provision of a well defined but general purpose input structure for selected data, which is used as input to analysis routines. Metadata is used to describe the multiple analysis routines and their capabilities, and a framework both automatically formats the input data and customizes the analysis routine to accept the specific dynamic fields available in the input. In this manner, data (e.g., a query) containing new input fields is dynamically made available to multiple existing analysis techniques.
Aspects of the invention achieve particular advantage in the area of medical services (e.g., managing patient records). Accordingly, embodiments will be described in this context. However, the invention is more generally applicable to any data, regardless of type or content and, therefore, not limited to the particular applications described herein, which are provided by way of illustration only.
One embodiment of the invention is implemented as a program product for use with a computer system such as, for example, the environment 100 shown in
In general, the routines executed to implement the embodiments of the invention, may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions. The computer program of the present invention typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described hereinafter may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
The client application 102 is shown as being configured with, or having access to, a user interface 108. Preferably, the user interface 108 is a graphical user interface. In a particular embodiment, the user interface 108 is a network browser (e.g., a Web browser) allowing for navigation of network addresses. The client application 102 and the user interface 108 may allow users to formulate and issue queries for execution against one or more databases 103. In addition, the client application 102 and the user interface 108 facilitate customization of input subsequently provided to the analysis applications 106 for analysis. The input generally includes some data to be analyzed, as well as analysis instructions (e.g., executable code or control information) needed to perform the analysis. The data to be analyzed may originate from any of a variety of sources such as, for example, the database 103. In one embodiment, the data to be analyzed are query results. The analysis instructions needed to perform the analysis are provided by the analysis framework 104, and the particular nature of the analysis instructions depends upon the analysis to be performed by the analysis applications 106.
Customization of the data to be analyzed and the analysis instructions may be performed through a series of user selections made via screens of the user interface 108. In one embodiment, the user interface screens are populated with information from the analysis framework 104. Specifically, the analysis framework 104 includes an abstract analysis model 110 containing information used to populate the user interface screens with a plurality of analysis routines selections, and subsequently with parameter selections needed for the selected analysis routine. The user-selected parameters and the data to be analyzed are then provided to an appropriate plug-in 114 specified by the abstract analysis model 110 according to the user-selected analysis routine. The plug-ins 114 take and format the user-selected parameters and the data to be analyzed, and then provide the results of its operations to the analysis applications 106. In some cases, the plug-ins 114 populate templates 112 with the user-selected parameters and the data to be analyzed. The populated templates are then the input provided to an analysis application 106. In any case, the appropriate analysis application 106 then runs the selected analysis routine and returns any results to the user interface 108 for display to the user.
Further, each plug-in 114 is defined to accept predefined parameters (or more particularly, parameter values). Accordingly, the abstract analysis model 110 includes a parameter definitions portion 210 (also referred to herein as the parameter metadata 210) which includes parameter definitions sets 2081, 2082 . . . 208N (collectively, parameter definition set(s) 208), where each parameter definition set 208 is specific to a particular routine selection 206. The parameter metadata 210 does not contain parameter values themselves, but rather defines an interface for receiving the parameter values. At least in part, the parameter values are user selected from a user interface screen 212 populated by the data to be analyzed (e.g., the query results) and, in some cases, from hidden fields specified in the parameter definition sets 208. Thus, those parameter values made available to the user for selection from the screen 212, are directly dependent on the data to be analyzed (e.g., the query results).
In one embodiment, one or more of the user-selected parameter values (which may include any default values the user did not change) are used to populate one of a plurality of templates 2141, 2142 . . . 214N (collectively, template(s) 214). Whether a template 214 is needed is determined by the analysis application 106 to be run. A template allows for ease in building the analysis instructions where the majority of the instructions are fixed, but a portion are based on the user-selected parameters. If the analysis technique has only a single fixed format required for its analysis instructions then the plug-in may provide those directly without needing to look them up from a template. If a template 214 is to be invoked, the given parameter definition set 208 specifies a specific template 214. Further, if a template 214 is specified, the given parameter definition set 208 assigns a marker name to each of the various parameter values used to populate the specified template 214. The markers can be used subsequently by the appropriate plug-in 114 to populate a template 214.
Thus, the plug-ins 114 take as input the data to be analyzed (e.g., the query results) and the parameter values, which may include a template specification. Again, the particular plug-in 114 taking the input is contingent upon the user's analysis routine selection made from the user interface screen 202. The plug-in 114 then generates input 218 to the appropriate analysis applications 106. Generally, this input may be executable code or non-executable information, depending upon the particular application 106 to be invoked. The analysis routine is then run and the results from the application 106 are then displayed to the user via an output screen 220.
Particular aspects of the invention will now be illustrated with respect to exemplary user interface screens and corresponding metadata, as well as other elements of
As noted above, following display of query execution results in the screen 202, the user may elect to perform analysis. An illustrative screen 202 of the user interface 108 from which analysis may be initiated is shown in
The screen 202 is further configured with a variety of buttons which a user may click to invoke a desired function. For example, clicking on an “OK” button causes the query results screen to be dismissed and the user returned to the query selection screen. Clicking on a “Save Results” button 308 allows the user to save the query results. The results may be analyzed according to the selected analysis routine (in the present example “SAS Tabulate”) by clicking “Go” button 310.
As noted above, each analysis routine selection 206 has a fixed and predefined association with a parameter definition set 208. Thus, for the SAS Tabulate analysis routine, the parameters definition set 208 is provided at lines 079-104 of Table I. In particular, the parameters include template parameters at lines 080-087, a control parameter specifying where the routine is run at lines 088-091, a series of user-selectable parameters populated with values from the query results at lines 092-096, and a parameter to render a text box that the user can fill in at line 100. Note that a separate template parameter for a particular template may be given for each operating system having different file system references. In the present illustration, a pair of template parameters is provided: one for Windows at lines 080-083 and one for AIX at lines 084-087. It is contemplated that the specified plug-in for the parameters definition set can execute the analysis application on the same server, or make a Web Services call to another server. Accordingly, a control parameter is provided at lines 088-091 to specify where the application is executed. In the present example, the parameters metadata for the SAS Tabulate analysis routine specifies three user selectable parameters: a “class variable”, a “list of variables” and a “title”. Note that these three user selectable parameters are given field names F1, F2, and F3, respectively. The field names correspond to markers in the template specified by the template parameter. The template and the markers will be described in more detail below.
Therefore, having selected an analysis routine selection 206 from the screen 202, the user is presented with the parameter selection screen 212 populated according to the corresponding parameter definitions set 208. An illustrative parameter selection screen 212 is shown in
In particular, clicking the “Execute” button 414 causes the query results and the parameter values to be input to the appropriate plug-in 114. The plug-in 114 is responsible for formatting the input and generating additional information needed to run the selected analysis routine. For example, additional information generated by the plug-ins 114 includes information needed to read the data to be analyzed into a particular analysis application 106.
As noted above, the plug-ins 114 may substitute the user-selected parameter values into a template 214. However, it is also contemplated that, in some cases, the plug-ins 114 are sufficiently coded to generate all the information needed to run the selected analysis routine without the use of templates 214. Again, whether or not a template 214 is invoked depends upon the analysis routine to be run.
If the parameter definitions set 208 specifies a template 214, then the plug-in 114 operates to merge the user-selected parameter values with the specified template 214 from the templates database 112. Therefore, since a particular parameter definitions set 208 (and a plug-in 114) is user selected according to the selected analysis routine (in this case “SAS Tabulate”), it follows that the template 214 is implicitly selected by the user's explicit selection of an analysis routine from the screen 202. In the present illustration, the parameters metadata for the SAS Tabulate analysis routine specifies a template having the name “Tabulate.txt”, as can be seen at lines 080-083 for the Windows operating system, and at lines 084-087 for the AIX operating system. An illustration of this template is shown in Table II.
In the present example, the template of Table II conforms to the SAS programming language, since the user-selected analysis routine is a SAS routine. At line 001 “proc tabulate” refers to a well-known SAS procedure to build a table. Lines 003, 005, 007, 008 and 010 correspond to those portions where parameter values are substituted, as specified according to a marker corresponding to the field name of the parameter. Recall that the field names of the parameters are specified in the parameter metadata. In the present example, the field names/markers are F1, F2, and F3. In one embodiment, the markers may be of the following type:
Field—Any of the fields returned from the query (or other data to be analyzed).
List of Fields—A list of one or more fields returned from the query.
Text string—A simple text entry capability where the user enters free form text.
List of pre-defined text values—The user selects from a list of pre-defined choices.
Integer—The user enters an integer number.
Float—The user enters a decimal or floating point number.
As can be seen by the user selections in
The plug-in 114 then supplements the template 214 with additional information needed to run the analysis routine. An exemplary program generated by the plug-in 114 for SAS is shown in Table IV.
Line 001 specifies a destination for the output generated by the plug-in. Lines 002-022 make up the dynamically generated information needed to read the data into SAS. The DATA statement (line 002) and the INFILE statement (line 003) are standard to SAS. The LENGTH statement (line 004) specifies the length of the fields. Note that only the length for character fields needs to be specified. The INPUT statement (line 005) specifies the list of fields returned from the query. The data values of the query results are provide at lines 008-021. Lines 023-036 are the populated template shown in Table III.
The exemplary program of Table IV is representative of the input 218 shown in
For the present example, the results of running the SAS Tabulate routine with the input of Table IV are shown in the output screen 220 illustrate in
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application is a continuation of co-pending U.S. patent application Ser. No. 10/345,918, filed Jan. 16, 2003, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 10345918 | Jan 2003 | US |
Child | 12372117 | US |