1. Field of the Invention
Embodiments of the invention generally relate to computer database systems. More particularly, the invention relates to techniques for applying value constraints and record set constraints to entity based conditions, and for building a query from those conditions.
2. Description of the Related Art
Databases are well known systems for storing, searching, and retrieving information stored in a computer. The most prevalent type of database used today is the relational database, which stores data using a set of tables that may be reorganized and accessed in a number of different ways. Users access information in relational databases using a relational database management system (DBMS).
Each table in a relational database includes a set of one or more columns. Each column typically specifies a name and a data type (e.g., integer, float, string, etc), and may be used to store a common element of data. For example, in a table storing data about patients treated at a hospital, each patient might be referenced using a patient identification number stored in a “patient ID” column. Reading across the rows of such a table would provide data about a particular patient. Tables that share at least one attribute in common are said to be “related.” Further, tables without a common attribute may be related through other tables that do share common attributes. A path between two tables is often referred to as a “join,” and columns from tables related through a join may be combined to from a new table returned as a set of query results.
Queries of a relational database may specify which columns to retrieve data from, how to join the columns together, and conditions (predicates) that must be satisfied for a particular data item to be included in a query result table. Current relational databases require that queries be composed in complex query languages. Today, the most widely used query language is Structured Query Language (SQL). However, other query languages are also used. An SQL query is composed from one or more clauses set off by a keyword. Well-known SQL keywords include the SELECT, WHERE, FROM, HAVING, ORDER BY, and GROUP BY keywords. Composing a proper SQL query requires that a user understand both the structure and content of the relational database as well as the complex syntax of the SQL query language (or other query language).
Embodiments of the invention generally provide techniques for applying value constraints and record set constraints to entity based conditions, while reducing the amount of time and errors associated with manually composing query code.
One embodiment of the invention includes a method of processing an abstract query composed from a plurality of logical fields specified by a data abstraction model constructed for an underlying physical database. The method generally includes receiving an abstract query composed from one or more of the plurality of logical fields. The abstract query includes a selection of a model entity, and the model entity specifies a logical focus for the abstract query. The abstract query may further include (i) an entity based condition specifying a condition which must be satisfied for each value included in a collection of values related to a given instance of the model entity in order for the given instance of the model entity to be included in query output; and (ii) a record set constraint specifying a condition to constrain the collection of records against which the entity based condition is evaluated. The method may generally include generating a resolved query of the underlying physical database; executing the resolved query to retrieve a set of query output that includes a set of instances of the model entity that satisfy the entity based condition; and returning the query output to a user.
Another embodiment of the invention includes a computer-readable storage medium containing a program which, when executed, performs an operation for processing an abstract query composed from a plurality of logical fields specified by a data abstraction model constructed for an underlying physical database. The operation may generally include receiving an abstract query composed from one or more of the plurality of logical fields. The abstract query includes a selection of a model entity, and the model entity specifies a logical focus for the abstract query. The abstract query may further include (i) an entity based condition specifying a condition which must be satisfied for each value included in a collection of values related to a given instance of the model entity in order for the given instance of the model entity to be included in query output, and (ii) a record set constraint specifying a condition to constrain the collection of records against which the entity based condition is evaluated. The operation may further include generating a resolved query of the underlying physical database, executing the resolved query to retrieve a set of query output that includes a set of instances of the model entity that satisfy the entity based condition, and returning the query output to a user.
Yet another embodiment of the invention includes a system having a processor and a memory containing a program configured to perform an operation for processing an abstract query composed from a plurality of logical fields specified by a data abstraction model constructed for an underlying physical database The operation may generally include receiving an abstract query composed from one or more of the plurality of logical fields. The abstract query includes a selection of a model entity, and the model entity specifies a logical focus for the abstract query. The abstract query may further include (i) an entity based condition specifying a condition which must be satisfied for each value included in a collection of values related to a given instance of the model entity in order for the given instance of the model entity to be included in query output, and (ii) a record set constraint specifying a condition to constrain the collection of records against which the entity based condition is evaluated. The operation may further include generating a resolved query of the underlying physical database, executing the resolved query to retrieve a set of query output that includes a set of instances of the model entity that satisfy the entity based condition, and returning the query output to a user.
So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
The complexity of constructing an SQL statement makes it difficult for average users to compose queries of a relational database, particularly when applying record set constraints to entity based conditions. An entity based condition is a value constraint (for example, “hemoglobin_test>15”) which is evaluated against all rows of data for an instance of a given entity (for example, a patient). The rows of data are used to include (or exclude) instances of the entity depending on whether they satisfy the value constraint. For example, if the entity based condition is “hemoglobin_test>15,” and if a particular patient meets that condition for every measurement, then data regarding all hemoglobin tests may included in query output. Otherwise, data regarding that patient is not included in query output, even if only one row of data violates the condition (for example, if only one hemoglobin measurement was less than 15). Thus, the entity based condition is applied to the entity as a whole.
Applying record set constraints to the entity based condition makes a query difficult to construct. A record set constraint allows a user to specify a subset of records to which the entity based condition is applied. For example, a researcher may want to look at all of the records for patients who always had high hemoglobin (“hemoglobin_test>15”) while they were children. The record set constraint would be “AgeAtTest<18.” If a particular patient always had high hemoglobin as a child, then all of the patient's records would be returned in the output (including data taken when the patient was an adult). Even if the patient had low hemoglobin when he was 20 years old, he would be selected, because the entity based condition (“hemoglobin_test>15”) is satisfied for the record subset (AgeAtTest<18).
One embodiment of the invention provides a method that allows a user to compose value constraints and/or record set constraints for an entity based condition, and that builds a query from those conditions. To create value constraints and record set constraints for an entity based condition, the user first selects a logical field to include in a query condition. Next, the user specifies the value constraining condition. Then, the user designates the condition as an entity based condition. For example, the user may specify that either all records related to a given entity must meet the condition, or no records must meet the condition. Finally, the user specifies one or more record set constraints for the entity based condition. A query is then generated that applies the entity based condition to the subset of data specified by the record set constraints, for each instance of the entity.
In the following, reference is made to embodiments of the invention. However, it should be understood that the invention is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, in various embodiments the invention provides numerous advantages over the prior art. However, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
One embodiment of the invention is implemented as a program product for use with a computer system. The program(s) of the program product defines functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable media. Illustrative computer-readable media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive) on which information is permanently stored; (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive) on which alterable information is stored. Other media include communications media through which information is conveyed to a computer, such as through a computer or telephone network, including wireless communications networks. The latter embodiment specifically includes transmitting information to/from the Internet and other networks. Such computer-readable media, when carrying computer-readable instructions that direct the functions of the present invention, represent embodiments of the present invention.
In general, the routines executed to implement the embodiments of the invention, may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions. The computer program of the present invention typically is comprised of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described hereinafter may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
The server system 110 may include hardware components similar to those used by the client system 105. Accordingly, the server system 110 generally includes a CPU, a memory, and a storage device, coupled by a bus (not shown). The server system 110 is also running an operating system, (e.g., a Linux® distribution, Microsoft Windows®, IBM's OS/400® or AIX®, FreeBSD, and the like).
The network environment 100 illustrated in
In one embodiment, users interact with the server system 110 using a graphical user interface (GUI) provided by a user interface 115. In a particular embodiment, GUI content may comprise HTML documents (i.e., web-pages) rendered on a client computer system 1051 using web-browser 122. In such an embodiment, the server system 110 includes a Hypertext Transfer Protocol (HTTP) server 118 (e.g., a web server such as the open source Apache web-server program or IBM's Web Sphere® program) configured to respond to HTTP requests from the client system 105 and to transmit HTML documents to client system 105. The web-pages themselves may be static documents stored on server system 110 or generated dynamically using application server 112 interacting with web-server 118 to service HTTP requests. Alternatively, client application 120 may comprise a database front-end, or query application program running on client system 105N. The web-browser 122 and application 120 may be configured to allow a user to compose an abstract query, and to submit the query to the runtime component 114 for processing.
As illustrated in
In one embodiment, the runtime component 114 may be configured to receive an abstract query, and in response, to generate a “resolved” or “concrete” query that corresponds to the schema of underlying physical databases 214. For example, the runtime component 114 may be configured to generate one or more Structured Query Language (SQL) statements from an abstract query. The resolved queries generated by the runtime component 114 are supplied to DBMS 116 for execution. Additionally, the runtime component 114 may be configured to modify the resolved query with additional restrictions or conditions, based on the focus of the abstract query, i.e., based on the model entity specified for a given query.
In one embodiment, the database abstraction model 148 provides definitions for a set of logical fields 208 and model entities 225. Users compose an abstract query 202 by specifying logical fields 208 to include in selection criteria 203 and results criteria 204. An abstract query 202 may also identify a model entity 201 from the set of model entities 225. The resulting query is generally referred to herein as an “abstract query” because it is composed using logical fields 208 rather than direct references to data structures in the underlying physical databases 214. The model entity 225 may be used to indicate a logical focus of the abstract query 202 (e.g., a query focused on a “patient”, a “person”, an “employee”, a “test”, a “facility” etc). The data abstraction model 148 may include a model entity defining the model entity relative to data in the underlying physical database. For example, instances of the “patient” model entity may be defined relative to a “patient ID” value stored in a “demographics” of the underlying physical database. Of course, the exact definition for any model entity may be tailored according to the circumstances of a particular case.
Illustratively, abstract query 202 includes an indication that query 202 is directed to instances of a “patient” model entity 201, and further includes selection criteria 203 indicating that patients with a “hemoglobin_test>20” should be retrieved. The selection criteria 203 are composed by specifying a condition evaluated against the data values corresponding to a logical field 208 (in this case the “hemoglobin_test” logical field. The operators in a condition typically include comparison operators such as =, >, <, >=, or, <=, and logical operators such as AND, OR, and NOT. Results criteria 204 indicates that data retrieved for this abstract query 202 includes data for the “name,” “age,” and “hemoglobin_test” logical fields 208.
As stated, in one embodiment, an abstract query may specify a type of model entity being queried (e.g., a patient, an employee or a test). The model entity defines a logical focus, or central concept, for an abstract query. Thus, rather than compose a query data based on the structure of an underlying database (e.g., an SQL schema), users compose a query about a model entity (e.g., about a patient) by specifying which logical fields should be used to evaluate whether a given instance of the model entity (i.e., data related to a particular patient) should be included in the query results. Doing so allows users to compose complex queries in a straightforward and intuitive manner. Numerous examples of model entities used to provide a focus for an abstract query are described in commonly assigned U.S. Pat. No. 7,054,877 (the '877 patent) entitled “Dealing with Composite Data through Data Model Entities.”
In one embodiment, runtime component 114 (also referred to as a query builder) may be configured to retrieve data from physical database 214 by generating a resolved query (e.g., an SQL statement) from abstract query 202. Because database abstraction model 148 is tied to neither the schema of physical database 214 nor the syntax of a particular query language, additional capabilities may be provided by database abstraction model 148 without having to modify the underlying database. Further, depending on the access method specified for a logical field, runtime component 114 may transform abstract query 202 into an XML query that queries data from database 2141, an SQL query of relational database 2142, or other query composed according to another physical storage mechanism using other data representation 2143, or combinations thereof (whether currently known or later developed).
An illustrative abstract query corresponding to abstract query 202 is shown in Table I below. In this example, the abstract query 202 is represented using extensible markup language (XML). In one embodiment, query interface 115 may be configured to enable a user to compose an abstract query, and to generate an XML document to represent the finished abstract query. Those skilled in the art will recognize that XML is a well known markup language used to facilitate the sharing of structured text and information. Of course, other markup languages may be used.
The XML markup shown in Table I includes the selection criteria 203 (lines 004-008) and the results criteria 204 (lines 009-013). Selection criteria 203 includes a field name (for a logical field), a comparison operator (=, >, <, etc) and a value expression (what the field is being compared to). In one embodiment, the results criteria 204 include a set of logical fields for which data should be returned. The actual data returned is consistent with the selection criteria 203. Lines 14-18 identify the model entity selected by a user, in this example, a “Patient” model entity. Thus, the query results returned for abstract query 202 are instances of the “Patient” model entity and data for the results criteria specified in the query (lines 009-013). Line 15 indicates the identifier in the physical database 214 used to identify instances of the model entity. In this case, instances of the “Patient” model entity are identified using values from the “Patient ID” column of a patient table.
After composing an abstract query, a user may submit it to runtime component 114 for processing. In one embodiment, runtime component 114 may be configured to process abstract query 202 by generating an intermediate representation of abstract query 202, such as an abstract query plan. In one embodiment, an abstract query plan is composed from a combination of abstract elements from the data abstraction model and physical elements relating to the underlying physical database. For example, an abstract query plan may identify which relational tables and columns are referenced by which logical fields included in abstract query 202, and further identify how to join columns of data together. Runtime component 114 may then parse the intermediate representation in order to generate a physical query of the underlying physical database (e.g., an SQL statement(s)).
A simple access method specifies a direct mapping to a particular entity in the underlying physical database. Field specifications 2081, 2082, and 2085 each provide a simple access method, 2121, 2122, and 2125, respectively. For a relational database, the simple access method maps a logical field to a specific database table and column. For example, the simple field access method 2121 shown in
Logical field specification 2083 exemplifies a filtered field access method 2123. Filtered access methods identify an associated physical database and provide rules defining a particular subset of items within the underlying database that should be returned for the filtered field. Consider, for example, a relational table storing test results for a plurality of different medical tests. Logical fields corresponding to each different test may be defined, and a filter for each different test is used to associate a specific test with a logical field. For example, logical field 2083 illustrates a hypothetical “Hemoglobin Test.” The access method for this filtered field 2123 maps to the “Test_Result” column of a “Tests” tests table and defines a filter “Test_ID=‘1243’.” Only data that satisfies the filter is returned for this logical field. Accordingly, the filtered field 2083 returns a subset of data from a larger set, without the user having to know the specifics of how the data is represented in the underlying physical database, or having to specify the selection criteria as part of the query building process.
Field specification 2084 exemplifies a composed access method 2124. Composed access methods generate a return value by retrieving data from the underlying physical database and performing operations on the data. In this way, information that does not directly exist in the underlying data representation may be computed and provided to a requesting entity. For example, logical field access method 2124 illustrates a composed access method that maps the logical field “age” 2084 to another logical field 2085 named “birthdate.” In turn, the logical field “birthdate” 2085 maps to a column in a demographics table of relational database 2142. In this example, data for the “age” logical field 2084 is computed by retrieving data from the underlying database using the “birthdate” logical field 2085, and subtracting a current date value from the birth date value to calculate an age value returned for the logical field 2084. Another example includes a “name” logical filed (not shown) composed from the first name and last name logical fields 2081 and 2082.
By way of example, the field specifications 208 shown in
Table I, below, illustrates an SQL query composed according to method 1000, using the example of
As shown, the query on lines 001-005 and 030-031 selects hemoglobin values for patients that satisfy the entity based condition (“hemoglobin_test>15”) with the record set constraints on the entity based condition of “AgeAtTest<18” and ∂date<Jan. 1, 2000.”
Lines 014-015 of Table II show the value constraining condition, which is built using the database column that contains the numeric value of the hemoglobin test (step 1008 of method 1000). Lines 16-18 show the record set constraining conditions (step 1020 of method 1000), built using the corresponding database columns. Specifically, line 016 constrains the record set to values that were recorded before Jan. 1, 2000. Lines 017-018 constrains the record set to values that were recorded while the patient was less than 18 years old (step 1020 of method 1000). Of course, many other types of constraining conditions may be used to constrain the record set. Line 019 illustrates the query language generated at step 1018 of method 1000, where condition for the field filter is added. In this case, assume the condition of “LOINC_CODE”=‘20570-8” is used to specify a hemoglobin test. Lines 030-031 are used to test which patients did or did not match the entity based condition specified by the abstract query.
Advantageously, embodiments provide a convenient way for users to apply value constraints and record set constraints to entity based conditions, and to build a query from those conditions, without the need for spending time to write complicated queries, like the query of Table II. In one embodiment, a query builder uses the conditions to generate sub-tables and query code for execution. The sub-tables are used to create query language statements (e.g., SQL statements) that limit the record set against which entity based conditions are evaluated. Thus, the present invention provides an efficient method to create record constraints for entity based conditions, while reducing the amount of time and errors associated with manually composing query code.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.