1. Field of Invention
The present invention relates to the field of data management, and in particular to a flexible ontology data import/export method and apparatus which can import/export ontologies from known standard formats to private formats and enable an ontology management system to be applied to more applications.
2. Description of Prior Art
Ontology is one of the important data semantic representation methods, and the aim of ontology data management is to enable an application to make better use of ontology data so as to achieve an intelligent level. The import and export of ontology data are significant functions in an ontology data management, the function of importing can store ontology data from different sources into an ontology management system, and the function of exporting can represent ontology data in the ontology management system with various formats for use by different applications.
Ontology information can be organized as a plurality of forms including standard formats recognized in the industry, corporate-inside formats used widely within a corporate as well as private formats utilized in certain applications.
The difficult and crucial problems to be solved in the current ontology data management are how to handle the import of an unknown private format in terms of ontology import, and how to export a private format based on different conditions in terms of ontology export.
Currently, there have been some efforts and methods related to ontology import/export.
For example, the system in U.S. Pat. No. 0,156,253 can import ontologies in known format, and export ontology data congruous to a query condition according to XQL (XML Query Language), and the exported format is also a fixed known format.
Meanwhile, the system in U.S. Pat. No. 0,163,450 can import ontologies in a fixed format 1 and export ontologies in a fixed format 2, while the ontology is not stored, and the import and export are completed in one pipeline.
The method proposed in JP 10-333952 can separate automatically with a separator the data inside some application into data formed as a table or a list, and then export these data to other applications for further use.
Further, the method in JP 08-190479 performs data conversion based on value range and length of database column type during database conversion, and decides the correspondence between the columns of conversion source and destination.
In summary, the existing methods of ontology import/export can import/export ontologies in fixed known formats, and export ontology data conforming to query conditions based on query language at the time of export. The following problems, however, have not been overcome.
(1) Ontologies in any unknown private format cannot be imported. Importing ontologies in any private unknown format include data separation and determination of property value range and type, while the current methods can handle only data separation and have no ability to determine property value range and type.
(2) It is impossible to export ontologies in any known and unknown format according to search condition for a keyword.
(3) It is impossible to export ontologies in any known and unknown format in the manner of the combination of keywords and query language.
The import/export methods based on text and database can execute format conversion according to data type, while text and database differ considerably from ontology data in that the former is a data carrier, and in the latter there exist concepts, properties and instances, the conversion of which should be considered respectively during the conversion of ontology data. Therefore, remarkable change needs to be made to the data conversion method based on text and database when this method is applied to ontology import.
In view of the above problems, the present invention provides a flexible ontology data import/export method and apparatus, which can handle ontology data import/export of a known format and an unknown format, export a portion of condition-consistent ontology data in conjunction with keywords and query sentences and store them in a known or unknown format. Due to its excellent adaptability to various formats, the ontology data import/export method and apparatus according to the present invention can meet the requirements from more types of applications.
According to the first aspect of the present invention, an ontology data import apparatus is provided which comprises: an ontology data format analyzer for analyzing a format of input ontology data, and performing format determination by utilizing an ontology format database which stores formats of those successfully imported ontologies if it is a private unknown format; and an unknown format data importer for separating and importing the data into an ontology database which stores ontologies, in accordance with a result of the format determination by the ontology data format analyzer.
According to the second aspect of the present invention, an ontology data import method is provided which comprises steps of: analyzing a format of input ontology data, and performing format determination by utilizing an ontology format database which stores formats of those successfully imported ontologies if it is a private unknown format; and separating and importing the data into an ontology database which stores ontologies, in accordance with a result of the format determination.
According to the third aspect of the present invention, an ontology data export apparatus is provided which comprises: a keyword-based and query-language-combined ontology exporter for receiving a keyword and/or an export format, after processing into a query language, querying ontology data stored in an ontology database which stores ontologies, and requesting an export format to an ontology data export format analyzer; an ontology data export format analyzer for directly returning the export format or obtaining the export format by a query utilizing an ontology format database which stores formats of those successfully imported ontologies, in accordance with the request from the keyword-based and query-language-combined ontology exporter; and the keyword-based and query-language-combined ontology exporter is further for outputting a query result returned from the ontology database in accordance with the export format returned from the ontology data export format analyzer.
According to the fourth aspect of the present invention, an ontology data export method is provided which comprises steps of: receiving a keyword and/or an export format, after processing into a query language, querying ontology data stored in an ontology database which stores ontologies, and requesting an export format; directly returning the export format or obtaining the export format by a query utilizing an ontology format database which stores formats of those successfully imported ontologies, in accordance with the request for the export format; outputting a query result returned from the ontology database in accordance with the returned export format.
According to the fifth aspect of the present invention, an ontology data import/export apparatus is provided which comprises an ontology data import apparatus and an ontology data export apparatus as described above.
The above objects, advantages and features of the present invention will be apparent from the following detailed description on the preferred embodiments taken conjunction with the drawings in which:
a and 4b are schematic diagrams showing two input cases of unknown format ontology data, respectively;
a shows a flowchart for the unknown format import operation of an ontology data format analyzer 130;
b is a schematic diagram for elaborating an example of step S504 in
a-8c show schematic diagrams of a query input interface;
Now, a detailed explanation will made to the preferred embodiment of the present invention with reference to the figures, during which any detail or function unnecessary to the present invention will be omitted, otherwise it will obscure the understanding of the present invention.
As shown in
Hereafter, a concrete description will be give to respective components and their operation approaches in the ontology data import/export apparatus 100 of the present invention.
All concepts must be instances belonging to rdfs:class. That is, for any concept x, there is one triple (x rdf:type rdfs:class) in the ontology database. The inheritance relationship between concepts is denoted by rdfs:subClassOf. In other words, if x belongs to the subclass of y, there exists a triple (x rdfs:subClassOf y). As an example, the entry 200 is the definition for concept Company in
Properties define the characteristics of concepts as well as the relationship between concepts, the property denoting concept characteristics is referred to as numerical-type property, and the property denoting the relationship between concepts is referred to as objective-type property. Both of the numerical property and the objective property have definition domain and value domain, the definition domain means which concepts the property works at, for example, the definition domain for the property “age” is the concept “human”. That is, “age” is a property of the concept “human” rather than any other concept. The value domain defines the range of the value for the property. The value domain can be of data type, such as integer, real number or character string, if the property has numerical type, while the value domain can be of concept type if the property is of objective type. As an example, the definition domain of the objective property “friend” is “human”, and its value domain is also “human”. In
Instances are specific matters under a concept, for example, “Jack” is an instance of “human”. Instances of a concept have values of properties whose definition domains lie in the concept, say, “Jack” has the value “35” of the property “age”. In
At the time of import, a concept, a property and an instance are formed as a corresponding triple, respectively, based on the input ontology data, and these triples are stored in the ontology database. During the export process, the record of the triples, which fulfill certain conditions, is exported.
The ontology format database 120 is employed to analyze formats, and, as shown in
The private unknown formats 320 are generally organized as triples, each of which is separated by specific separators, and the separated parts correspond to the subject, the property and the object of the triples in an ontology data, respectively. The example of a private format is #NEC$$rdf:type$$Company. The separated triple corresponding to this private unknown format is Subject$$Property$$Object. While storing the private unknown formats, the ontology format database 120 stores how many times each separator has been used in these formats. For example, the usage frequency of “$” will be increased by 1 after the above format is placed into the ontology format database (referring to statistics information 3320 in statistics information 330).
The ontology data format analyzer 130 is responsible for analyzing a format of input ontology data, and selecting the known format data importer 140 or the unknown format data importer 150 to import ontology data, depending on the different formats of input ontology data. The ontology data format analyzer 130 operates as follows.
More specifically, the unknown format import operation flow of the ontology data format analyzer 130 is illustrated in
At step S501, if the input of the unknown format contains format information as shown in
At step S502, if the input of the unknown format does not contain any format information, as shown in
At step S503, if none of the historical unknown formats in the ontology format database 120 meet the requirement, n most-frequently used separators, such as “$”, “#”, “;”, “*”, “%” and the like, are extracts from a separator database in the ontology format database.
At step S504, the following operations are repeated for each of the n separators:
in some row (corresponding to a single instance) of the input contents, a search for this separator is performed; if a occurrence position corresponding to this separator can be found, then forward and backward separator detections are continuously performed from this position; to be more specific, in the example shown in
At step S505, the format, which is extracted, analyzed or generated by formation detection, is returned.
Each known format registers an import module in the known format data importer 140, and thus the known format data importer 140 only needs to invoke a corresponding import module for importing in accordance with the input format, as shown in
Since the standard has been published, the known format data importer 140 can extract the concept, the property and the instance from the input ontology contents, generate corresponding triples and then import them into the ontology database 110.
After receiving the format and content information from the ontology data format analyzer 130, the unknown format data importer 150 needs to analyze the subject, the predicate and the object so as to determine an import approach. Different processing methods are employed for the subject, the predicate and the object and described particularly as follows.
1. The subject is processed into rdf:resource, since the subject must be an instance.
2. As for the predicate, the definition domain is processed into rdf:resource to correspond to the subject, while the following judgment should be performed at the time of the determination of value domain:
The following analysis is conducted to determine which of the above two cases the value of the predicate belongs to:
3. The triple is directly imported if the predicate is identified as a property of numerical type (including a numerical-type property having an integer or real number value domain as well as a numerical-type property having a character-string value domain); the triple is directly imported if the predicate is identified as a property of objective type and the URI of the object exists in the ontology database; the above rdf:resource needs to be created and then the triple is imported if the URI of the object is not present in the ontology database.
After the completion of the above steps, the unknown ontology format is imported into the ontology database 110.
Then, the predicate “hasProduct” is acquired for type determination. At the moment, the object “Versa1100” is acquired. Since “Versa110” appears in the subject, too, “hasProduct” is of objective type, and a corresponding objective-type property is generated as (hasProduct rdf:type owl:ObjectProperty).
Next, the object “Versa1100” is acquired, and a corresponding rdf:resource (Versa1100 rdf:type rdf:resource) is created due to the absence of “Versa1100”.
Finally, the property value of NEC (NEC hasProduct Versa1100) is imported.
On the other hand, the subject “Vers” is first acquired for the ontology data of an unknown format Versa1100$#hasPrice#; 10000, and there is no need for creating rdf:resource since the subject has been present.
Then, the predicate “hasPrice” is acquired for type determination. At the moment, the object “10000” is acquired. Since “10000” is of numerical type, a numerical-type property is created as (hasProduct rdf:type owl:datatypeProperty).
Next, the object “10000” is acquired.
Finally, the property value of Versa1100 (Versa1100 hasPrice 10000) is imported.
There are two types of export in ontology data export, one is exporting all ontology data, which can be referred to as unconditioned export, and the other is exporting part of ontology data, which can be referred to as conditioned export. Condition can be formed by use of keywords and query sentences, and thus the definitions are first give to keywords and query sentences.
Keywords mean one or more vocabulary composed of natural language, and the input of keywords is relatively simple and thus suitable for use by preliminary users. The input interface of keywords in the system is shown in
Query sentences mean a query approach which has the syntax of some query language, can execute complex condition specification and can query data of specific structures. The input interface of query sentences in the system is shown in
The query target of this sentence is all the triples in the ontology.
Keyword query has advantage of simple and accessible to ordinary users, while its shortcoming is that the query target that can be specified is not precise enough, and therefore it is impossible to make full use of semantics in ontology data. For example, when “Tsinghua University” is queried, it cannot be specified which position the “Tsinghua University” appears in an instance. On the other hand, a query sentence can specify clearly the concrete semantics of “Tsinghua University”.
Now, considering the following two instances:
Instance 1,
Instance 2,
When a keyword is used in exporting, “Tsinghua University” is inputted, and the two instances will be exported since both of them match the condition. As such, keywords cannot make good use of the semantics of ontology data, since it cannot express the export target: an instance named as “Tsinghua University”. Query language, however, can fulfill this task, since the instance named as “Tsinghua University” can be represented as the following query language:
Query language can also perform complex conditional operations, such as AND, OR and NOT, and thus make full usage of the semantics in ontology data.
The present invention integrates the advantages of keyword and query language and provides a query method on the basis of the combination of keywords and query sentences (see
At step S901, all domain ontologies in the ontology database is acquired and listed for a user to select.
At step S902, after the user has selected a domain ontology, all concepts within the domain ontology is acquired and listed for the user to select a query target.
At step S903, when the user has selected a concept as the query target, all properties of the concept are acquired for the user to add/delete/edit query conditions.
At step S904, the user starts to add/delete/edit query conditions.
At step S905, the query conditions are added: the user selects a property and chooses from 7 forms of condition setting:
Then, keywords or numerical values are inputted with respect to the selected properties and conditions.
At step S906, deletion or edition of query condition can be imposed on the added query conditions.
At step S907, query can be executed if all the query conditions have been edited; otherwise, the flow returns to step S904.
At step S908, ontology data conforming to the conditions are acquired and exported.
As an example, the query of an instance named “Tsinghua University” can be conducted through the above steps:
In this way, the user needs to just select and input the keyword during the above procedure. It is unnecessary for the user to know about ontology query language, and thus the user group of ontology semantics is expanded.
After receiving the query conditions and keywords, the system generates corresponding query sentences to conduct query in the ontology database 110, acquires all the ontology data satisfying the query conditions and inputs them into the keyword-based and query-language-combined ontology exporter 160. Returning to the above example, the system generates the following query sentence as the final result according to a series of actions by the user:
Those ontology data which satisfy the query conditions can be obtained through this query sentence and then exported. The export format is determined by the ontology data export format analyzer 170, and the exported data have two types as shown in
As to the first format, the exported data need not to carry any format since the export command by the user has taken the export format with itself. For the second format, since the user has not give any explanation of format in the export command, it is necessary to provide the user with the format recommended by the system so that the user can operate the ontology data by utilizing the format.
The ontology data consistent with the export conditions may be exported with a known industrial standard format (e.g. OWL) or a known standard format within a corporation (e.g. MISP), and may be exported with an unknown format. At the time of data export, the export command includes the export condition and export format.
As an example, the export command is
This is an unconditioned export command which requires the export format to be OWL. Therefore, the ontology data will be exported as an OWL file.
Another example of the export command is
This is an export command which has only export condition but no export format.
When receiving this kind of command, the ontology data export format analyzer 170 operates in the flow as shown in
At step S1101, the ontology data export format analyzer 170 determines whether the export command requires certain export format, and if certain export format is required, the flow turns to step S1103 and the format is returned directly.
At step S1102, if no export format is specified in the export command, the format recommendation needs to be done, which process can be based on the latest and most frequently used export format. Thus, two parameters are required, one being time period n, and the other being threshold threshold. The details are given as:
Naturally, the operation suggested above is intended to exemplary, and those ordinarily skilled in the art can make adaptations as necessary. For example, in order to reduce the operational complexity, the format corresponding to the maximum usage rate ratemax can be selected directly as the export format, instead of the cyclic operation as demonstrated above.
At step S1103, the export format contained in the export command or that selected at step S1102 is returned.
The present invention has been described in conjunction with the preferred embodiments. It should be understood that those ordinarily skilled in the art can make various changes, substitutions and additions within the principle and scope of the present invention. Therefore, the scope of the present invention is not limited to the above particular embodiments but should be defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
200710162924.1 | Sep 2007 | CN | national |