The present invention relates to data storage repository systems, and in particular to systems for querying a data storage repository.
The number of sources or repositories of data are increasing. These sources may be electronic instruments generating real time data, computer systems gathering and storing data, or remote systems returning data in response to requests from a user. It is often required to integrate and/or combine data retrieved from the different data sources. Typically each data source is developed and/or maintained independently from the others, possibly by different vendors. This results in different methods for querying the data source, and different formats for both the query to the data source and the data retrieved from the data source. Further, new data sources frequently become available, and access to these data sources is desired by a user.
For example, in medical content management systems, diverse sources of medical data are available, and new ones become available. Data from the diverse sources are combined to derive useful information. For example, in the diagnosis and treatment of cancer, metabolic information derived from PET or SPECT studies may be correlated with the anatomical information derived from high resolution CT studies. Further data may be available from molecular imaging which is also combined with the data described above. Each additional source of data requires that the querying system for accessing this data, and the formats for communicating queries and data, be adapted to the new sources of data.
The different medical data systems, such as picture archiving and communication systems (PACs), radiology information systems (RIS), laboratory information systems (LISs) and other department information systems, are not individually configured to accommodate the diversity of data which is available now and will be available in the future. This is because current data storage repository query systems use a fixed data schema, and different data storage repositories use different fixed query systems. Further, different applications use different query schemas and data formats for querying data storage repositories. A system for querying a data storage repository which is flexible and dynamic in nature is desirable.
In accordance with principles of the present invention, a system adaptively queries a data storage repository. An input processor receives a plurality of different first query messages in a corresponding plurality of different formats. A repository includes stored data elements in a first storage data structure. An intermediary processor automatically: parses the plurality of first query messages to identify requested data elements; maps the identified requested data elements to stored data elements in the first storage data structure of the repository; generates a plurality of second query messages in a format compatible with the repository for acquiring the stored data elements; acquires the stored data elements from the repository using the generated plurality of second query messages; and processes the acquired stored data elements in the plurality of second query messages for output in a format compatible with the corresponding plurality of different formats of the first query messages.
Such a system enables different applications, each implementing a different data model, to access the same data stored in the same storage repository. In a special case of this situation, the same application may implement different data models to access the same data. In addition, such a system permits adding a new data type or replacing a data element with a new data element, possibly being stored in a different location or on a different storage repository. Such a system also permits dynamically changing the storage data model, i.e. the model of the data within the storage repository, without affecting the applications. That is, the applications do not need to now how the data is stored on the repository. Similarly, such a system permits dynamically changing of the data storage repository itself. That is, a change may be made in the data storing devices holding the storage data structure. These changes may be made without requiring a change in the executable application or executable procedures implementing either the applications or client, or the data storage repository. This means that no recoding and no retesting of executable application code is necessary to provide the various changes described above.
In the drawing:
A processor, as used herein, operates under the control of an executable application to (a) receive information from an input information device, (b) process the information by manipulating, analyzing, modifying, converting and/or transmitting the information, and/or (c) route the information to an output information device. A processor may use, or comprise the capabilities of, a controller or microprocessor, for example. The processor may operate with a display processor or generator. A display processor or generator is a known element for generating signals representing display images or portions thereof. A processor and a display processor comprises any combination of, hardware, firmware, and/or software.
An executable application, as used herein, comprises code or machine readable instructions for conditioning the processor to implement predetermined functions, such as those of an operating system, a system for adaptively querying a data storage repository, or other information processing system, for example, in response to user command or input. An executable procedure is a segment of code or machine readable instruction, sub-routine, or other distinct section of code or portion of an executable application for performing one or more particular processes. These processes may Include receiving input data and/or parameters, performing operations on received input data and/or performing functions in response to received input parameters, and providing resulting output data and/or parameters.
A data repository as used herein comprises a source of data records. A data repository may be a one or more storage devices containing the data records and may be located local to or remote from the processor. If located remote from the processor, data may be communicated between the processor and the data repository through a communications channel, such as a dedicated data link, a computer network, i.e. a local area network (LAN) and/or wide area network such as the Internet, or any combinations of such communications channels. A data repository may also be sources of data records which do not include storage devices, such as live feeds, e.g. news feeds, stock tickers or other such real-time data sources. A record as used herein may comprise one or more documents and the term “record” may be used interchangeably with the term “document”.
The World Wide Web Consortium (W3C) has defined a standard called XML schema. An XML schema provides a means for defining the structure, content and semantics of XML documents. An XML schema is used to define a metadata structure. For example, the metadata may define or mirror the structure of a collection of nested tables. The respective tables contain a collection of fields (that cannot be nested). The respective fields contain a collection of data elements.
The term abstraction refers to the practice of reducing or factoring out details so broader, more important concepts, may be concentrated on. The term data abstraction refers to abstraction of the structure and content of data, such as data stored in data repositories, from the meaning of the data itself. For example, a user may be interested in an X-Ray image, but not where data representing that image is stored, how it is stored, or the mechanism required to access and retrieve that data. A data abstraction layer refers to an executable application, or executable procedure which maintains a data abstraction between a user and the storage of data important to the user. In particular, as used herein, a data abstraction layer is a system for obtaining data from a repository without prior knowledge of the repository structure using predetermined information supporting parsing, analyzing and querying the repository.
The term “Schema” is used herein in different contexts. When it is used in relation to XML (e.g. “XML schema”), a normal XML schema file conforming to the w3c definition is meant. When it is used in relation to a database, the database schema (e.g. tables, rows, fields, or hierarchy, etc.) as part of the real database is meant. When it is used in relation to a term of the data-abstraction layer (e.g. “output schema”), the XML schema file containing the information is meant (described in more detail below). An XML file which describes information used by the data abstraction layer and adheres to one of the data abstraction layer schemas, is referred to as “<data abstraction layer term>” plus “file”, e.g. “Mapping file” (also described in more detail below).
In operation, the input processor 10 receives a plurality of different first query messages in a corresponding plurality of different formats. The repository 20 contains stored data elements in a first storage data structure. The input processor 10 sends the plurality of first query messages to the intermediary processor 30 which automatically performs the following activities. It parses the plurality of first query messages to identify requested data elements. It maps the identified requested data elements to stored data elements in the first storage data structure in the repository 20. It generates a plurality of second query messages in a format compatible with the repository 20 for acquiring the stored data elements. The plurality of second query messages are sent to the repository 20. The intermediary processor 30 acquires the stored data elements from the repository 20 using the generated plurality of second query messages. Further, it processes the stored data elements acquired in response to the plurality of second query messages for output in a format compatible with the corresponding plurality of different formats of the first query messages.
More specifically, the input processor 10 receives at least one first query message including a request for information and an instruction determining a data format for providing the information. The instruction is alterable to adaptively change the information and the data format for providing the information. The instruction determining the data format for providing the information may be in a markup language output schema. For example, the markup language output schema may be an extendible markup language (XML) schema. This query message is sent to the intermediary processor 30. The intermediary processor 30 parses the at least one first query message to identify requested data elements. It maps the identified requested data elements to stored data elements in the first storage data structure of the repository 20. It then generates at least one second query message in a format compatible with the repository 20 for acquiring the stored data elements, which is sent to the repository 20. It acquires the stored data elements from the repository 20 using the generated at least one second query message. Further, it processes the stored data elements acquired in response to the at least one second query message for output in a format compatible with the data format determined by the instruction in the at least one first query message.
In the system of
The first query messages comprise files conforming to a query schema and the second query messages comprise queries executable by the repository 20. The first query messages are in a format determined by the query schema. The query schema determines: (a) the query search depth of hierarchical data elements in the repository 20, and/or (b) restrictions on searching the repository 20. The query schema may comprise (a) an SQL compatible query format, and/or (b) an Xquery compatible format.
As described above, the intermediary processor 30 processes stored data elements acquired from the repository 20 for output in a format compatible with the corresponding plurality of different formats of the first query messages. The format compatible with the corresponding plurality of different formats of the first query messages are determined by an output schema. The system of
The data abstraction component further accesses the information in the information model mapper 206 to generate second query messages in a format compatible with the repository 20 to request the identified stored data elements. The second query messages are in a format executable by the repository 20. For example, in the case of a computer database, the second query messages may be in an SQL compatible query format or an Xquery compatible query format. The second query messages are supplied to the repository 20. In response, the repository 20 returns the requested stored data elements. The data abstraction component 204 acquires the stored data elements from the repository 20 in response to the second query messages. The data abstraction component 204 again accesses information in the information model 206 to process the acquired stored data elements to place them in a format compatible with the corresponding first query received from the input processor 10 (
In
The information model mapper 206 further includes one or more output schema 302 (described in more detail below). An output schema 302 specifies the relationship among the available requested data elements defined in the scope 303 of an application (e.g. core schema 304 and extension schemas 306). More specifically, the output schema 302 defines an output hierarchy by specifying levels in the information model. The combination of the scope 303 of an application and one output schema 302 defines the information model 305 for either a whole application, or a part of it (e.g. one client).
A mapping schema 308 (described in more detail below) defines the contents and structure of a mapping file 309. A mapping file 309 specifies the correspondence among data elements defined in the information model 305 and the storage data structure of the repository 20 (
The information model mapper 206 further includes a query schema 310 (described in more detail below). In order to retrieve data from the repository 20, the data abstraction layer 206 processes query data 202 received from the input processor 10 (
The data abstraction component 204 further includes a resource schema 312 (described in more detail below). The resource schema 312 defines the content and structure of a resource file 313. The resource file 313 serves as a repository of data specifying external data sources in the repository 20. These data sources may be queried by the data abstraction layer 204 or data may be returned to the requester so that the external data sources may be queried by the requester outside of the data abstraction layer 204. Examples of the schemas and files illustrated in
In more detail, a core schema 304 describes the basic elements that an output schema 302 in the same scope 303 may use to build up an output model. The multiple output schemas 302 include the schema data contained in the core schema 304 in order to have access to its elements. In the present embodiment, in which the core schema and output schema are XML schemas, the term ‘includes’ means a textual copying of the contents of the core schema 304 into the multiple output schemas 302. This may be done by placing a textual reference to the core schema 304 in the multiple output schemas 302. The core schema 304 does not define any relation between the provided elements and is not used as a schema for actual XML files. Common data types and element groups for convenient reference may be defined in a core schema 304. Its main use is to unify the declaration of commonly used elements in one scope. The basic structure is:
A core schema 304 also defines which elements can provide additional external links. An external link is a reference to a resource, defined in the resources file 313 combined with an identifier that specifies the requested information. A requestor can use this information to access that data source directly to retrieve the objects stored there.
In more detail, an extension schema 306 provides the ability to extend the core schema 304 by some application or implementation specific common elements. One or more extension schemas 306 may be defined which have substantially same structure as the core schema 304, but do not have to be used by every output schema 302. The extension schemas 306, together with the core schema 304, define the scope 303 of an application. The scope 303 represents the basic framework within which different information models may be implemented.
In more detail, an output schema 302 describes the data model on which a requesting application: bases its requests (e.g. an output model). It includes a core schema 304 and optionally one or more extension schemas 306 to access the basic elements that make up the scope 303. An output schema 302 specifies a hierarchy that defines the context in which the data elements are represented. The queried results from the repository 20 are formatted based on the specified hierarchy before they are returned to the requestor. Beside the usage of the common elements, an output schema 302 may also introduce new elements that are only specific to that single output model. Such elements are typically levels, which include nested elements, e.g. levels that reflect real database levels or auxiliary levels that do not exist in the real database data model. Other elements may be defined in either the core or the extension schema, 304, 306. One output schema 302 together with the core and the extension schemas 304, 306 make up an information model 305, which describes the semantics of the current data model without referencing anything in the real database. The link between the currently used information model defined by the output schema 302 and the actual representation in the database is defined in a mapping schema 308. An output schema 302 describes a complete hierarchy. A query can narrow a requested depth down or request only certain parts of the output model. The following is the general layout of an output schema 302:
In more detail, a mapping schema 308 describes the structure of an XML file, which defines how elements used in the output schema 302 correspond to tables, fields or other entities in the repository 20. An actual XML mapping file 309 maps the data specified in one output schema 302. A different mapping file 309 is needed if another output schema 302 is used in the same scope 303 and this output schema 302 introduces new levels. Otherwise the same mapping file 309 may be used. A mapping file 309 consists of the following primary elements:
The children used in the primary elements are:
Referring in more detail to a query schema 310, an application can submit multiple queries to request data from the data abstraction layer 204. The respective :queries are expressed in an XML file, which conforms to the query schema 310. One query XML file may contain one query at a time. The result of each query is formatted according to the output model, as defined by an output schema 302, regarding the query depth and restrictions. The query may be defined in a standard query language such as SQL or XQuery. In this way a widely known language is used and a requester is not required to learn a new query language. It is possible that not all the possible operators and query elements of a particular query language are supported by the data abstraction layer 204. In such a case, a restricted subset of applicable query operations and relations may be defined. The query language itself is the database independent way of describing a query. Each query Is parsed by the data abstraction layer 204 according to the currently used database in the repository 20.
Referring in more detail to a resource schema 312, possible data sources, which the data abstraction layer 204 or the requester may access in order to retrieve data, are defined in the resource schema 312. A certain resource is specified by its type and its actual connection information. The type describes of what kind the data source is, e.g. “PACS”. There may be one or more instances of a type. Each instance describes an actual connection to a data source of that type. In the resource schema 312, the possible types are defined. A resource XML file 313, which adheres to the resource schema 312 is as follows:
Referring to
Referring to
In
With the core schema 304, output schema 302, and mapping file 309 defined, the adaptive query system operates as illustrated in
In step 402 an output schema 302 (
In step 410, the data in the mapping file 309 (
Although not shown in the present example, the data abstraction component 204 (
The data elements retrieved from the repository 20 are typically in a different format from that requested by the first query. In step 412, when the requested data has been retrieved from the repository 20 (i.e. a database and/or external data source), the data abstraction component 204 (
In
In step 414, the retrieved data (
In a system as illustrated in
This is a non-provisional application of provisional applications Ser. No. 60/803,750 by S. F. Owens et al. filed Jun. 2, 2006.
Number | Date | Country | |
---|---|---|---|
60803750 | Jun 2006 | US |