More than three million medical self-care books are sold each year. Health websites such WebMD attract more than 10 millions visitors each month. However, the information in books or on the Internet is not easily accessible to people. To search for a specific symptom in a book, a reader has to match that to an index, which sometimes is not organized in a way that the reader can use effectively. To search for information on a health website, a user has to type in keywords. Keyword searching normally generates many irrelevant links, not directly related to the user's symptoms. On WebMD, symptoms are organized by body parts. After a user chooses “knee”, there is a long list of items such as “leg injuries”, “leg problems”, “knee problems and injuries”, “toe, foot and ankle injuries” and so on to choose from. Users have to navigate for a long time to find a specific item related to their problems. Many people give up. Frequently, users cannot find the information they seek.
Some attempts have been made to understand text input from users to make the searching feel more natural in various domains. While these attempts have had some interesting results, natural language understanding is still imperfect. The following references, each of which is incorporated herein by reference, describe various historical and technical aspects related to natural language, dialogue, and knowledge representation for a perspective on the state of the art:
Allen, James F., 1995. Natural Language Understanding, Benjamin Cummings Publishing.
Baader, F. and Bernhard H., 1991. “KRIS: Knowledge Representation and Inference System,” SIGART Bulletin 2,8-14.
Blaylock, N., James Allen, and George Ferguson, 2002. Synchronization in an asynchronous agent-based architecture for dialogue systems. In Proceedings of the 3rd SIGdial Workshop on Discourse and Dialog, Philadelphia.
Borgida, A., Ron Brachman, Deborah McGuinness, and Lori Halpern-Resnick, 1989. “CLASSIC: A Structural Data Model for Objects”, Proc. of the 1989 ACM SIGMOD Int'l Conf. on Data, pp. 59-67.
Colby, K. M., 1999. “Human-Computer Conversation in a Cognitive Therapy Program”, in Yorick Wilks (Editor), Machine Conversations, Kluwer Academic Publishers.
Doyle, Jon and Ramesh Patil, 1991, “Two Theses of Knowledge Representation: Language Restrictions, Taxonomic Classification, and the Utility of Representation Services”, Artificial Intelligence, 48, pp. 261-297.
George Ferguson and James F. Allen, 1998, “TRIPS: An Integrated Intelligent Problem-Solving Assistant,” Proceedings of the Fifteenth National Conference on AI (AAAI-98), Madison, Wis., 26-30.
Goldmann, David R., and Horowitz, David A, 2002, Home Medical Adviser, DK Publishing, New York.
Junling Hu and Michael P. Wellman, 1998. “Online learning about other agents in dynamic multiagent systems”, Proceedings of the Second International Conference on Autonomous Agents.
Junling Hu, Daniel Reeves and Hock-Shan Wong, 2000.“Personalized Bidding agents for Online Auctions”, Proceedings of The Fifth International Conference on The Practical Application of Intelligent Agents and Multi-Agents.
Hwang, C. H. and Schubert, L. K., 1993. “Episodic Logic: A comprehensive, natural representation for language understanding.” Minds & Machines, v. 3 (1993): 381-419.
Karp, Peter D., Suzanne M. Paley, and Ira Greenberg, 1994, “A Storage System for Scalable Knowledge Representation”, in Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Md., ACM Press: 97-104.
Krohn, Jacqueline and Taylor, Frances A., 1999, Finding the Right Treatment, Harley and Marks Publishers
Lin, Dekang, 1995, A Dependency-based Method for Evaluating Broad-Coverage Parsers, Proceedings of IJCAI-95.
Lin, Dekang, 1994, PRINCIPAR—An Efficient, broad-coverage, principle-based parser, In Proceedings of COLING-94. pp. 42-488, Kyoto, Japan.
Lin, Dekang, 1993, Principle-based Parsing without Overgeneration, In Proceedings of ACL-93, pp. 112-120, Columbus, Ohio.
Lin, Dekang, and Shaojun Zhao, Lijuan Qin, Ming Zhou. 2003. Identifying Synonyms among Distributionally Similar Words. In Proceedings of IJCAI-03, pp. 1492-1493.
Montague, Richard, 1974. The proper treatment of quantification in ordinary English. In R. Thomason, editor, Formal Philosophy. Selected Papers of Richard Montague. Yale University Press, New Haven.
Schubert, L. K. and Hwang, C. H. (2000), “Episodic Logic meets Little Red Riding Hood: A comprehensive, natural representation for language understanding”, in L. Iwanska and S. C. Shapiro (eds.), Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, MIT/AAAI Press, Menlo Park, Calif., and Cambridge, Mass., 111-174.
L. K. Schubert, “The situations we talk about”, in J. Minker (ed.), Logic-Based Artificial Intelligence, Kluwer, Dortrecht, 2000, 407-439.
C. H. Hwang and L. K. Schubert. “Interpreting tense, aspect, and time adverbials: a compositional, unified approach”, in D. M. Gabbay and H. J. Ohlbach (eds.), Proc. of the 1st Int. Conf. on Temporal Logic, July 11-14, Bonn, Germany, Springer-Verlag, pp. 238-264, 1994.
C. H. Hwang and L. K. Schubert, 1993. “Episodic Logic: A situational logic for natural language processing,” In P. Aczel, D. Israel, Y. Katagiri, and S. Peters (eds.), Situation Theory and its Applications 3 (STA-3), CSLI, 307-452.
Traum, David and Lenhart K. Schubert, Massimo Poesio, Nat Martin, Marc Light, Chung Hee Hwang, Peter Heeman, George Ferguson, and James F. Allen, “Knowledge representation in the TRAINS-93 conversation system,” Intl. Journal of Expert Systems, 9(1), Special Issue on Knowledge Representation and Inference for Natural Language Processing, 1996, pp. 173-223.
Vickery, Donald M., Fries, James F. (2000) Take Care of Yourself, Perseus Publishing.
A computer program capable of conducting natural language dialogue seems fairly reachable at the first glance. After all, sentences are just text strings (for text-based conversation). With today's computers' large memory, it is fairly easy to store large number of sentence patterns, and quickly retrieve them. That is why the first conversational program ELIZA that appeared in the mid-1960's was an attempt to store all possible ways that people can speak. This is also the approach of contemporary chatterbots, including ALICE, Ultra Hal Assistant, Ella, and by commercial talking programs that act as customer service agents All of these programs adopt the ELIZA approach with simply more patterns in their programs. However, this ad hoc approach has problems. The complexity of human language, with its huge number of ways to say similar things, is a technological barrier. This barrier is unlikely to be overcome by simply adding more phrase patterns or sentence templates.
It would be advantageous to develop a conversation program that is based on real language understanding. Real understanding means understanding basic grammar to parse sentence structure, understanding the meaning of words and phrases, and having an internal representation to reason about the meanings. It would further be advantageous to apply the program to a domain, such as the self-help medical domain.
The present invention is illustrated by way of example, and not by way of limitation.
A technique for domain-based natural language dialogue includes a program that combines a broad-coverage parser with a general-purpose interpreter and a knowledge base to handle unrestricted English sentences in a domain, such as the medical self-help domain. The broad-coverage parser may have more than 40,000 words in its dictionary. The general-purpose interpreter may use logical forms to represent the semantic meaning of a sentence. The knowledge base may include a domain of modest size, but the interpretive and inference techniques may be domain independent and scalable.
The technique may be used to build a large-scale dialogue system that is capable of natural language understanding. The system may pave the way for introducing natural language understanding into commercial systems. This may significantly improve the conversational quality of dialogue systems, and therefore make many systems more widely accepted by customers. For example, the improvements over the current customer service agents may put these agents into a prominent role instead of the side role they play now. This may lead to true cost saving for companies that deploy these agents. The improvement for the current training agents may make these agents play a larger role in employee training, or course instruction. All of this may streamline the training process, improve productivity, and reduce human training cost.
The technique may also have deep impact on research in natural language processing, causing future researchers to move away from toy domains with small-scale parsers or special-purpose interpreters. Instead, they may adopt a large-scale parser and general-purpose interpreter, according to embodiments described herein.
The technique should also advance Al in general. A fully functional dialogue agent is one of the ultimate goals of artificial intelligence. The technique may provide the appropriate platform to implement all Al technologies such as learning, reasoning, planning and multiagent interaction (the interaction between the agent and the user).
The network may be any internal network, such as a LAN, WAN, or intranet, or a global information network, such as the Internet. The computing devices 106 communicate with the domain-based dialogue server 102 over the network 104. The computing devices 106 may be any type of computing device including, but not limited to, general purpose computers, workstations, docking stations, mainframe computers, wireless devices, personal data assistants (PDAs), smartphones, or any other computing device that is adaptable to communicate with the domain-based dialogue server 102.
The memory 110 includes one or more executable modules, including a user interface (UI) module 116, a text-to-speech (TTS) module 118, a dialogue manager module 120, a parser module 122, an interpreter module 124, and a knowledge base module 126. These modules may include procedures, programs, functions, interpreted code, compiled code, computer language, databases, or any other type of executable code or stored data. An example of how the modules may be used together to carry out a dialogue with a user is described with reference to
The flowchart continues at decision point 130 with determining whether the natural language input is text or voice. If the natural language input is not text, then at block 132 the natural language input is converted to text using, for example, a TTS module 118 (
The flowchart continues at block 134 with parsing text into a representational grammar. In order for a computer to understand a human language such as English, one essential step is parsing. A parser takes an English sentence, analyzes the sentence's structure and decomposes it into a parse tree that includes segments such as noun phrases or verb phrases.
The flowchart continues at block 136 with converting the parse tree into a semantic representation.
The flowchart continues at block 138 with determining meaning from the semantic representation. Determining meaning may require the use of a knowledge base that includes information about entailments of predicates (e.g., that “have a cut” entails “injured”) and about the world (e.g., that injuries generally involve bleeding), and general world knowledge. Moreover, the knowledge base should enable reasoning based on that information. These two types knowledge may be referred to as declarative knowledge and procedural knowledge respectively. The knowledge base may also include decision rules, such as IF-THEN logic.
In a medical domain, the knowledge base may include knowledge (particularly ontology) and structure of the human body and medical symptoms. This knowledge may be from medical ontologies created in the medical community. In addition, if the domain is particularly directed to a medical subcategory, such as self-help, the knowledge base may include another ontology related to, for example, self-care. Such ontology is based on usage by ordinary people, and is a little different from formal medical ontology. A self-help ontology may provide a basic level of understanding of a potential illness based on symptoms people observe at home. Typical medical diagnostic systems may rely on collected data, such as blood samples, that would not be commonly used for self-help diagnosis. Eventually, the self-help ontology may be mapped to formal medical ontology.
If the system understands received data, then the system can update state to incorporate the new data. Accordingly, the flowchart continues at block 140 with editing state. State represents possibly relevant information that can be drawn upon by the system to respond effectively to natural language input from the user. The system may include a user profile with previously entered data in addition to drawing upon new information from a user over the course of a conversation. An example of dialogue that makes use of state is described later with reference to
The flowchart continues at block 142 with deriving an appropriate response based on state. An appropriate response may depend upon the natural language input received last. For example, if the natural language input is “Did I mention that my eye hurts, too?” then the appropriate response may begin with a “Yes” or a “No”.
The flowchart ends at block 144 with providing the appropriate response. This may entail displaying the response by way of a UI, using text, voice, or both.
A dialogue manager, such as the dialogue manager provided by the dialogue manager module 120 (
The animated image 502 may move its lips, have a facial expression, or follow a pointer with its eyes. In certain domains, in particular the self-help domain, it may be desirable to have a realistic animated image. However, the animated image 502 is optional. The transcript 504 is a running display of prompts or responses from the system and inputs from the user. The transcript 504 facilitates checking previous answers, printing the dialogue between the user and the system, or providing the dialogue to a third party, such as a physician or medical diagnostic system. The display area 506A displays the prompt or response from the system. Typically, the display area 506A may include a prompt for information from a user (e.g., a question), a summary or response (e.g., a statement or exclamation), or advice for the user (e.g., a statement or command). The text box 508A includes text input from the user. If the system includes speech-to-text capability, speech may be translated into text and written into the text box 508A. Otherwise, the user may input the text directly. In any case, if the user clicks the Respond button 510, the system receives the input. Alternatively, the user may press the enter key on a keyboard to send the text to the system. If the user clicks the Restart button 512, the transcript 504 is deleted and the system restarts with an initial prompt, such as the one illustrated in the display area 506A. If the user clicks the Exit button 514, then the dialogue ends. The system may or may not update in accordance with the dialogue. A message may or may not be sent to the user or some third party following the end of the dialogue. A detailed description of the display following the end of dialogue is deemed unnecessary, but could be a home page of the company that is presenting the interface to the user.
In the example of
In
In
In
The dialogue illustrated with reference to
In
In
In
In
In
In
In
In
In
In
The columns of Table 1 are as follows:
C: Clauses
N: Noun and Noun Phrases
V: Verb and Verb Phrases
Prep: Preposition and Prepositional Phrases.
obj Object of verbs
subj (deep subject) Subject of verbs
s Surface subject
i The main verb of a clause.
A first step in preparing a parse tree for the example sentence is adding each word of the sentence to a parse table and assigning a label. For example, the sentence “I have pain in my stomach.” could be represented in the table by assigning the word “I” the label 1, the word “have” the label 2, the word “pain” the label 3, and so forth. Additional nodes may be added to the list of words, such as the nodes E0 and E1. In an embodiment, E0 exists for all sentences and represents the root of the sentence. However, other than the root node, these types of nodes are, for the most part, placeholders for values that may or may not be present in the sentence. For example, a verb can have a subject and an object, so a “placeholder” node E1 and E2 can be designated for the verb. If the verb does not have, for example, a subject, then the node E1 may be left as an artifact of the parsing process and the object of the verb may take the place of the placeholder node E2. Since E0 represents the root node, it is not simply a placeholder, and, in an embodiment, is not replaced with a node that corresponds to an input word.
In Table 1, E0 represents the root of the parse tree. Since all sentences have the root node, the E0 entry could naturally have been added to Table 1 prior to adding the words of the sentence. Other nodes, such as E1, may not be known until the sentence has been at least partially analyzed or parsed. In the example
As depicted in
The word “I” corresponds to the node 1, “have” to node 2, and so forth. The entries in each of the columns of Table 1 contain grammatical data and data related to the relationship of a word with the rest of the sentence.
The flowchart continues at block 1004 with determining the tense of the sentence. In the example of
The flowchart continues at block 1006 with determining predicates. There is no modal verb in the example sentence and the sentence may be represented as collection of predicates. In the example of
The flowchart continues at block 1008 with providing a semantic representation for one or more of the predicates. In an embodiment, the interpreter includes a database. The database includes multiple IF-THEN statements that facilitate understanding of the sentence. For example, the database may include the statement: If In(x) && x in(body-parts) && pain → “x pain”. This means that if there is pain in a body part x, then this can be converted to a semantic representation “X pain”. Accordingly, the semantic representation for the symptom that is understood from the example sentence of
The flowchart continues at block 1010 with mapping to a knowledge base. Continuing the example above, the semantic representation “stomach pain” is mapped onto a symptom name. For example, “stomach pain” may map to “heartburn” through a table that includes a list of semantic representations and associated symptom names. A symptom name may refer to a suspected diagnosis. For example, stomach pain could mean that a patient has heartburn. However, stomach pain could map to more than one symptom name, such as “ulcers.” In this case, the dialogue may first explore one potential diagnosis (e.g., heartburn) and, depending upon the success or failure of the potential diagnosis, explore another potential diagnosis (e.g., ulcers). In an embodiment, a semantic representation maps to only one potential diagnosis, which may change over the course of a dialogue with a patient. In another embodiment, the semantic representation maps to more than one potential diagnosis, and the diagnoses are explored sequentially or simultaneously. Once the symptom name, such as “heartburn” has been determined, a flowchart table is consulted to help generate dialogue relevant to determining whether the suspected diagnosis is correct.
Appendix A includes a list of symptoms, questions that are appropriate to further explore a diagnosis given the symptom, and actions to be taken when additional data is received from the user.
Appendix B includes a list of words and the symptoms with which they are associated.
Number | Date | Country | |
---|---|---|---|
60601580 | Aug 2004 | US |