The present invention relates to methods and systems for defining and handling user/computer interactions. In particular, the present invention relates to systems that resolve user input into a command or entity.
In typical computer systems, user input has been limited to a rigid set of user responses having a fixed format. For example, with a command line interface, user input must be of a specific form which uniquely identifies a single command and selected arguments from a limited and specific domain of possible arguments. Similarly, with a graphical user interface, only a limited set of options are presented to the user and it is relatively straight forward for a developer to define a user input domain consisting of a limited set of commands or entities for each specific user input in the limited set of user inputs.
By limiting a user to a rigid set of allowed inputs or responses, computer systems have required a significant level of skill from the user or operator. It has traditionally been the responsibility of the user to mentally translate the desired task to be performed into the specific input recognized by the applications running on the computer system. In order to expand the usability of computer systems, there has been an ongoing effort to provide applications with a natural language (NL) interface. The natural language interface extends the functionality of applications beyond their limited input set and opens the computer system to inputs in a natural language format. The natural language interface is responsible for performing a translation from the relatively vague and highly context based realm of natural language into the precise and rigid set of inputs required by a computer application.
Natural language interfaces utilize semantic objects and various actions to translate natural language inputs into information used by an application. When authoring applications that interact with natural language interfaces, application developers can use procedural and declarative programming languages to implement the semantic objects and actions. Procedural programming languages such as C, C++, C# and Fortran can define various actions performed on data objects during operation of an application. Declarative programming languages such as XML, LISP and Prolog can define the semantic objects of an application.
However, integration between declarative language code and procedural language code has been difficult for authors to develop. In one approach, a semantic object is represented by a declarative language, but an author is required to duplicate the semantic object declaration for each action using the semantic object. In another approach, an obscured declaration of a semantic object is used in a procedural language, which requires the author to track and maintain relationships of semantic objects. As a result, an authoring tool to integrate procedural logic modules and declarative logic modules would be useful.
The present invention relates to a computer readable medium having instructions that, when implemented on a computer cause the computer to process information. The instructions include a declarative logic module adapted to define a semantic object having at least one semantic slot and a procedural logic module adapted to define actions to be performed on said one semantic object with reference to the declarative logic module.
Another aspect of the present invention relates to a method for compiling an application. The method includes identifying a designation within a procedural logic module corresponding to a declarative logic module. A semantic object within the declarative logic module is accessed to perform actions in the procedural logic module.
Still another aspect relates to a procedural logic module for processing a natural language input. The module includes a designation corresponding to a declarative logic module defining a semantic object having at least one slot and procedural code adapted to perform actions on the declarative logic module using the semantic object.
A computer readable medium for processing natural language input from a user comprises another aspect of the present invention. The computer readable medium includes a plurality of procedural logic modules. Each procedural logic modules includes a designation corresponding to a declarative logic module defining a plurality of semantic objects arranged in a hierarchical structure. Additionally, each semantic object includes a plurality of slots that can be populated by the natural language input.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, telephony systems, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. Tasks performed by the programs and modules are described below and with the aid of figures. Those skilled in the art can implement the description and figures as processor executable instructions, which can be written on any form of a computer readable medium.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 110 through input devices such as a keyboard 162, a microphone 163, and a pointing device 161, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. For natural user interface applications, a user may further communicate with the computer using speech, handwriting, gaze (eye movement), and other gestures. To facilitate a natural user interface, a computer may include microphones, writing pads, cameras, motion sensors, and other devices for capturing user gestures. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110. The logical connections depicted in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
Typically, application programs 135 have interacted with a user through a command line or a Graphical User Interface (GUI) through user input interface 160. However, in an effort to simplify and expand the use of computer systems, inputs have been developed which are capable of receiving natural language input from the user. In contrast to natural language or speech, a graphical user interface is precise. A well designed graphical user interface usually does not produce ambiguous references or require the underlying application to confirm a particular interpretation of the input received through the interface 160. For example, because the interface is precise, there is typically no requirement that the user be queried further regarding the input, e.g., “Did you click on the ‘ok’ button?” Typically, an object model designed for a graphical user interface is very mechanical and rigid in its implementation.
In contrast to an input from a graphical user interface, a natural language query or command will frequently translate into not just one, but a series of function calls to the input object model. In contrast to the rigid, mechanical limitations of a traditional line input or graphical user interface, natural language is a communication means in which human interlocutors rely on each other's intelligence, often unconsciously, to resolve ambiguities. In fact, natural language is regarded as “natural” exactly because it is not mechanical. Human interlocutors can resolve ambiguities based upon contextual information and cues regarding any number of domains surrounding the utterance. With human interlocutors, the sentence, “Forward the minutes to those in the review meeting on Friday” is a perfectly understandable sentence without any further explanations. However, from the mechanical point of view of a machine, specific details must be specified such as exactly what document and which meeting are being referred to, and exactly to whom the document should be sent.
Logic modules 202 include procedural and declarative programming code to drive applications within interface 200. For example, the code can be written using XML, LISP, Prolog, C, C++, C#, Java and/or Fortran. The applications use semantic objects within the logic modules 202 to access information in a knowledge base 204. As used herein, “semantic” refers to a meaning of natural language expressions. Semantic objects can define properties, methods and event handlers that correspond to the natural language expressions. In one embodiment of the present invention, a semantic object provides one way of referring to an entity that can be utilized by one or more of the logic modules 202. A specific domain entity pertaining to a particular domain application can be identified by any number of different semantic objects with each one representing the same domain entity phrased in different ways. The term semantic polymorphism can be used to mean that a specific entity may be identified by multiple semantic objects. The richness of the semantic objects, that is the number of semantic objects, their interrelationships and their complexity, corresponds to the level of user expressiveness that an application would enable in its natural language interface. As an example of polymorphism “John Doe”, “VP of NISD”, and “Jim's manager” all refer to the same person (John Doe) and are captured by different semantic objects PersonByName, PersonByJob, and PersonByRelationship, respectively.
Semantic objects can also be nested and interrelated to one another including recursive interrelations. In other words, a semantic object may have constituents that are themselves semantic objects. For example, “Jim's manager” corresponds to a semantic object having two constituents: “Jim” which is a “Person” semantic object and “Jim's Manager” which is a “PersonByRelationship” semantic object. These relationships are defined by a semantic schema that declares relationships among semantic objects. In one embodiment, the schema is represented as a parent-child hierarchical tree structure. For example, a “SendMail” semantic object can be a parent object having a “recipient” property referencing a particular person that can be stored in knowledge base 204. Two example child objects can be represented as a “PersonByName” object and a “PersonByRelationship” object that are used to identify a recipient of a mail message from knowledge base 204.
Using logic modules 202, knowledge base 204 can be accessed based on actions to be performed and/or the semantic objects of the logic modules 202. As appreciated by those skilled in the art, knowledge base 204 can include various types and structures of data that can manifest themselves in a number of forms such as, but not limited to, relational or objected oriented databases, Web Services, local or distributed programming modules or objects, XML documents or other data representation mechanism with or without annotations, etc. Specific examples include contacts, appointments, audio files, video files, text files, databases, etc. Natural understanding interface 200 can then provide an output to the user based on the data in knowledge base 204 and actions performed according to one or more logic modules 202.
One aspect of the present invention allows logic modules 202 to be written using various languages including both procedural and declarative languages. Thus, application developers can write source files that utilize different languages to best represent a particular task to be performed and thereby take advantage of features provided by each language. For example, an XML source file can include a semantic object declaration, while a C# source file can include actions to be performed on the semantic object declared in the XML source file. Thus, a class definition can be “distributed” (i.e. accessible) across several source files that are authored in different languages. In one embodiment, source files can be implemented in a shared runtime environment such as Common Language Runtime (CLR).
In one embodiment of the present invention, procedural logic module 224 includes a “partial class” designation. The partial class designation or reference notifies compiler 222 that the particular class is across multiple source files. As a result, properties, methods and/or event handlers declared in declarative logic module 226 do not have to be repeated in the procedural logic module 224, and thus, compiler 222 will not suspend compiling because such properties, methods, event handlers, etc. are not present in procedural logic module 224.
The programming code below, written in XML, declares a class “Class1” of type “foo1” that includes at least one slot “Slot1” of type “foo2”. Other slots and declarations can also be applied in this code depending on a particular application using “Class 1”.
Given that “Class1” has been declared above, a procedural programming module including code such as that provided below can then be written to access slots in “Class1”. For example, the code below instantiates a partial class “Class1”, which notifies compiler 222 that another source file contains the declaration for “Class1”. The code below includes a place holder “noDoubt” for holding data that is used by the “Class1” procedures. As an example, “Slot1” in “Class1” is accessed by the procedure “Slot1.Count( )” provided below.
Semantic object 256 can be accessed by each of the procedural logic modules 250, 252 and 254 by using the partial class designation. Thus, application developers need to only develop one instance of travel semantic object 256. Code written in the airline reservation module 250, hotel reservation module 252 and rental car reservation module 254 can be executed to perform actions on any or all of the data elements in travel semantic object 256. As a result, applications can be authored in a time efficient manner that can reduce redundant declarations and thereby prevent errors resulting from integration of procedural and declarative logic.
In an exemplary embodiment with further reference to the system illustrated in
The slots of travel semantic object 256 are then at least partially populated with information from the natural language input. Rules and/or a semantic schema associated with travel semantic object 256 can then be utilized, if desired, to prompt the user for remaining unknown information such as herein an end date for travel. Using the information in the slots of travel semantic object 256, airline reservation module 250 can be instantiated to access knowledge base 204 to provide flight pricing, availability, etc. Additionally, hotel reservation module 252 and rental car reservation module 254 can be instantiated to offer potential hotel and rental car reservations based on travel semantic object 256 without further declarations of the travel semantic object 256 within the respective procedural coding modules. Thus, travel semantic object 256 can be accessed by a plurality of procedural programming modules with a single declaration.
The exemplary embodiments provided above are highly simplified in nature and are provided to illustrate operation of the present invention. They illustrate operation of the invention, which can be expanded to much more complex object hierarchies. When the natural language expression is not trivial, a tree of semantic object can be used to sufficiently capture the meaning of an utterance in a manner that can be conveyed in an appropriate form to logic modules 202.
The present invention thus provides a powerful operating tool that allows semantic objects to be declared in a multitude of programming syntaxes, for example using a declarative coding language such as XML or using a procedural coding language such as C# or C++. This framework is thus well suited for a plurality of procedural programming modules to utilize a declarative module without need for multiple declarations of semantic objects and vice versa. Attributes, properties, methods and event handlers of the objects are utilized by the procedural modules to allow easier authoring of the procedural modules in a single location.
Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
The present application is based on and claims the benefit of U.S. provisional patent application Ser. No. 60/538,306, filed Jan. 22, 2004, the content of which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60538306 | Jan 2004 | US |