The present disclosure relates generally to ontologies and in particular, to the creation and maintenance of ontologies.
An ontology is a formal description of concepts and their interrelationships within some domain and is typically used by a software application (e.g., to provide a smart search capability). The primary semantic relation captured in an ontology is that of class-subclass relations, usually called subsumption. For example, the more general concept of “vehicle” subsumes the subclass “truck.” This relationship is usually represented either as a set-subset relation, perhaps with a Venn diagram, or as a subsumption tree where a parent node (e.g., “vehicle”) represents the more general concept and the child nodes (e.g., “truck”) their subclasses.
Ontologies are generally defined by a trained knowledge engineer who understands the complexities of conceptual semantics. The trained knowledge engineer utilizes specialized development packages (e.g., Protégé) and languages (e.g., Resource Description Format (RDF) and Web Ontology Language (OWL)) to define an ontology. The trained knowledge engineer consults with a domain expert in order to properly understand and represent the concepts in the domain of interest. Thus, ontology development is a time-consuming and costly activity. Further, this high-cost activity continues indefinitely since an ontology must be maintained over time in order to capture new concepts and include them into the existing, interconnected framework initially developed.
One aspect of the invention is a method for creating and maintaining an ontology. The method includes transmitting a two dimensional table to a requestor. The table includes domain concepts and semantic primitives related to a domain. A data value in the table at an intersection of a domain concept and a semantic primitive indicates that the domain concept is characterized by the semantic primitive. A command to modify the table is received from the requestor. The table is updated in response to the command. The updated table is stored as an ontology.
In another aspect, a system for creating and maintaining an ontology includes an output mechanism for transmitting a two dimensional table to a requestor. The table includes domain concepts and semantic primitives related to a domain. A data value in the table at an intersection of a domain concept and a semantic primitive indicates that the domain concept is characterized by the semantic primitive. The system also includes an input mechanism for receiving a command from the requester to modify the table. The system further includes a processor in communication with the input mechanism and the output mechanism. The processor includes instructions for facilitating updating the table in response to the command and storing the updated table as an ontology.
In a further aspect, a computer program product for creating and maintaining an ontology includes a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes transmitting a two dimensional table to a requestor. The table includes domain concepts and semantic primitives related to a domain. A data value in the table at an intersection of a domain concept and a semantic primitive indicates that the domain concept is characterized by the semantic primitive. A command to modify the table is received from the requester. The table is updated in response to the command. The updated table is stored as an ontology.
Referring to the exemplary drawings wherein like elements are numbered alike in the several FIGURES:
Exemplary embodiments provide for the creation of a base ontology by trained knowledge engineers, but without using specialized ontology software and languages. In addition, the ontology may be maintained by domain experts, or end users. The domain experts do not require training in the specialized ontology software and languages to maintain the ontology. Exemplary embodiments provide a representational framework and development tool to allow the domain experts to maintain the ontology without being trained in ontology engineering.
In a subsumption relationship, the parent/containing concept represents a fundamental semantic notion, a “semantic primitive,” that the subsumed concept possesses, in some sense. Thus, it could be said that the concept “truck” possesses the feature “vehicle.” It possesses other features as well, but “truck” could be distinguished from “car”, for example if all of the features possessed by each were listed, with at least one feature listed with one concept but not the other. This observation leads to a new method for creating ontologies without the use of specialized software.
In exemplary embodiments, the creation of a base ontology entails the following steps by trained knowledge engineers, but without using specialized ontology software. The domain of vehicle assembly is utilized herein to illustrate the following examples, but the processes described herein may be applied to any domain. First, a trained knowledge engineer determines relevant domain concepts (e.g,. concept “A-gap” in an assembly plant body shop). Next, a set of semantic primitives that characterize the entire set of domain concepts uniquely is determined. A semantic primitive is a basic descriptor (e.g., A-gap has the descriptors: surface, misalignment, gap, and orientation). A table is then created that has the semantic primitives along one axis and the domain concepts on the other axis. Finally, each semantic primitive that applies to each domain concept is marked with a data value (e.g., “X”) at the intersection of the semantic primitive and applicable domain concept.
Each domain concept 102 is uniquely determined by the semantic primitives 104 that characterize it. By using binary (e.g., surface) or symbolic-valued (e.g., distance=too narrow or too wide) semantic primitives, a domain ontology can be deployed that will permit maintenance to be performed by end-users through a simple graphical user interface. In exemplary embodiments, the matrix (or table 100) depicted in
At block 204, a command to modify the table 100 is received (e.g., via the network) from the requestor and at block 206 the table 100 is updated in response to the command. The command may indicate that a selected domain concept 102 is characterized by a selected semantic primitive 104. As a result of the command, the table 100 is updated to add the data value (e.g., “X”) at the intersection of the selected domain concept 102 and the selected semantic primitive 104. Alternatively, the command may indicate that a selected domain concept 102 is no longer characterized by a selected semantic primitive 104. As a result of the command, the table 100 is updated to remove the data value (e.g., “X”) at the intersection of the selected domain concept 102 and the selected semantic primitive 104.
The command could also specify that a new domain concept 102 or a new semantic primitive 104 should be added to the table 100. See
In exemplary embodiments, the updating in block 206 is performed only if the updating will result in all of the domain concepts 102 in the updated table being uniquely characterized by one or more semantic primitives 104. The requestor is alerted to possible errors in a proposed modification via standard GUI methods (e.g., use of color and sound). Depending on the implementation requirement, the requestor may just be warned of possible errors or could be prevented from making the changes that would result in the errors. In addition, a requester may be prevented (or warned) from deleting a semantic primitive 104 that is being used to characterize one or more domain concepts 102.
At block 208, the updated table 100 is stored as an ontology. The ontology may be stored as a two dimensional table or the storing may include converting the table into another ontology format. In exemplary embodiments, the ontology is represented as a directed graph or as a hierarchical graph. In exemplary embodiments, the ontology is stored in a format that is compatible with a specialized ontology language (e.g., RDF and OWL) for further editing and/or for input into a specialized ontology development package (e.g., Protégé). In exemplary embodiments, the ontology is used for providing a smart search capability. For example, the ontology may be utilized to identify similar problem/solution pairs in a problem reporting system based on the words utilized to describe a problem. The semantic primitives 104 could be utilized to categorize the domain concepts 102 in order to pull up related problems.
The network 406 may be any type of known network including, but not limited to, a wide area network (WAN), a local area network (LAN), a global network (e.g. Internet), a virtual private network (VPN), and an intranet. The network 406 may be implemented using a wireless network or any kind of physical network implementation known in the art. A user system 402 may be coupled to the host system through multiple networks (e.g., intranet and Internet) so that not all user systems 402 are coupled to the host system 404 through the same network. One or more of the user systems 402 and the host system 404 may be connected to the network 406 in a wireless fashion. In exemplary embodiments, the network is an intranet and one or more user systems 402 execute a user interface application (e.g. a web browser) to contact the host system 404 through the network 406 while another user system 402 is directly connected to the host system 404. In other exemplary embodiments, the user system 402 is connected directly (i.e., not through the network 406) to the host system 404 and the host system 404 is connected directly to or contains the storage device 408. In other exemplary embodiments, the user system 402 includes a stand-alone application program to perform ontology creation and maintenance, as well as the application data such as the table 100. In this embodiment, the application program and data are updated on a periodic basis.
The storage device 408 may be implemented using a variety of devices for storing electronic information. It is understood that the storage device 408 may be implemented using memory contained in the host system 404 or it may be a separate physical device. The storage device 408 is logically addressable as a consolidated data source across a distributed environment that includes a network 406. Information stored in the storage device 408 may be retrieved and manipulated via the host system 404. The storage device 408 includes one or more tables 100, and ontologies (e.g., ontology databases). The storage device 408 may also include other kinds of data such as information concerning the updating of the table 100 (e.g., a user identifier, date, and time of update). In exemplary embodiments, the host system 404 operates as a database server and coordinates access to application data including data stored on storage device 408.
The host system 404 depicted in
The host system 404 may also operate as an application server. The host system 404 executes one or more computer programs to perform ontology creation and maintenance functions. Processing may be shared by the user system 402 and the host system 404 by providing an application (e.g., java applet) to the user system 402. Alternatively, the user system 402 can include a stand-alone software application for performing a portion or all of the processing described herein. As previously described, it is understood that separate servers may be utilized to implement the network server functions and the application server functions. Alternatively, the network server, the firewall, and the application server may be implemented by a single server executing computer programs to perform the requisite functions.
Exemplary embodiments allow maintenance of ontologies to be performed by knowledgeable end users who do not need to be conversant in specialized ontology development packages or languages. Exemplary embodiments represent ontologies as two dimensional tables with domain concepts 102 on one axis and semantic primitives 104 on the other axis. Most end users are knowledgeable in the use/interpretation of two-dimensional tables or spreadsheets to represent data and relationships between the data. The use of tables to represent ontologies eliminates the need for a knowledge engineer to be required for updating the ontology on a continuous basis. The only time during routine maintenance when a knowledge engineer may need to be brought back into the process would be when a new domain concept 102 is introduced that cannot be uniquely distinguished from the other domain concepts 102 using existing semantic primitives 104. In other words, there is a need for a new semantic primitive 104 to be introduced into the table as a new row.
As described above, the embodiments of the invention may be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. Embodiments of the invention may also be embodied in the form of computer program code containing instructions embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. An embodiment of the present invention can also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another.