This application claims the priority benefit of Taiwan application serial no. 93118735, filed on Jun. 28, 2004. All disclosure of the Taiwan application is incorporated herein by reference.
1. Field of the Invention
The present invention relates to a dialogue system and method, and more particularly to an integrated dialogue system and method using a bridge, or a bridge and a hyper-domain for domain integration.
2. Description of Related Art
As the demand in business services increases over the years, automatic dialogue systems such as portal sites, business telephone systems or business information search systems have been widely applied for providing information search or business transaction services to clients. Following are automatic dialogue systems descriptions of prior art.
In order to resolve the issue described above, other independent dialogue systems were introduced.
However, users nowadays require the integration of multiple-tier data. For example, when a user plans and prepares for a trip, the user might want to access information about, such as airline booking, hotel reservation, and the weather information at the destination. None of these prior art dialogue systems described above provides services for integration of information. In prior art dialogue systems, user had to repeat operation commands to obtain desired information. This repetition of commands is time-wasting and troublesome. Therefore, an integration dialogue system that can avoid drawbacks of repeating input commands is highly desired.
Accordingly, the present invention is directed to an integration dialogue system, which automatically recognizes the requirements of users and provides automatic dialogues and services.
The present invention is also directed to an integrated dialogue method for automatically recognizing the requirements of users and providing automatic dialogues and services.
The present invention discloses an integrated dialogue system. The system comprises a plurality of domains and a bridge. The bridge is coupled to each of the domains with bilateral communication respectively. After one of the domains, for example, a first domain, receives and recognizes input data, the first domain determines whether to process the input data by itself or to transmit the input data to a second domain via the bridge.
In an embodiment of the present invention, at least one of the domains comprises a domain database.
In an embodiment of the present invention, after recognizing the input data, the first domain further determines whether to process the input data by itself, or to generate a dialogue result by processing the input data in the first domain and transmit the dialogue result to the second domain, or to transmit the input data to the second domain without processing the input data.
In an embodiment of the present invention, the first domain obtains a local domain dialogue command and/or a dialogue parameter information, and generates a dialogue history information by recognizing the input data. If the first domain merely obtains the local domain dialogue command after recognizing the input data, the first domain will generate a dialogue result according to the local domain dialogue command and/or the dialogue history information. If the first domain obtains the dialogue parameter information and keywords in other domains after recognizing the input data, the first domain will transmit the input data and/or the dialogue parameter information and/or the dialogue history information to the second domain via the bridge. If the first domain obtains the local domain dialogue command, dialogue history information, and keywords in other domains after recognizing the input data, the first domain will transmit the input data to the second domain via the bridge according to a dialogue result received by the local domain dialogue command and/or the dialogue parameter information and/or the dialogue history information. If the first domain cannot obtain the local domain dialogue command and other-domain dialogue command, the first domain will send out an error signal.
In an embodiment of the present invention, the input data comprises a text input data or a voice input data.
In an embodiment of the present invention, each of the domains comprises a recognizer and a dialogue controller. The recognizer comprises a voice input to receive the voice input data, and/or a text input to receive the text input data, wherein the recognizer recognizes the voice input data or the text input data, and the recognizer is coupled to the bridge with bidirectional communications. The dialogue controller is coupled to the recognizer, wherein when the voice input data or the text input data is determined to be processed in the first domain, the dialogue controller receives information from the recognizer and processes the voice input data and/or the text input data to generate a dialogue result.
In an embodiment of the present invention, each of the domains further comprises a text-to-speech synthesizer, a voice output and a text output. The text-to-speech synthesizer is coupled to the dialogue controller for receiving and transforming the dialogue result into a voice dialogue result. The voice output is coupled to the text-to-speech synthesizer for sending out the voice dialogue result. The text output is coupled to an output for sending out the text dialogue result.
In an embodiment of the present invention, the recognizer comprises a voice recognition module, a grammar recognition module and a domain selector. The voice recognition module is coupled to the voice input for receiving the voice input data. The voice recognition module comprises a local domain lexicon corresponding to the domain with the recognizer, to determine a lexicon relationship between the voice input data and the domain with the recognizer and to output a recognized voice data. The grammar recognition module is coupled to the text input for receiving the text input data and to the voice recognition module for receiving the recognized voice data. The grammar recognition module comprises a local domain grammar database corresponding to the domain with the recognizer, to determine a grammar relationship between the text input data/recognized voice data and the domain with the grammar and to output a recognized data. The domain selector is coupled to the grammar recognition module, the dialogue controller and the bridge for generating a domain related to the recognized data according to the recognized data, the lexicon relationship and the grammar relationship.
In an embodiment of the present invention, the voice recognition module further comprises an explicit domain transfer lexicon database and an explicit domain transfer grammar database. The explicit domain transfer lexicon database serves to determine whether the voice input data is correlated to a first portion of data in the explicit domain transfer lexicon database. If yes, the voice input data is determined to be related to the domain corresponding to the first portion of data. The explicit domain transfer grammar database serves to determine whether the text input data or the recognized voice data is correlated to a second portion of data in the explicit domain transfer grammar database. If yes, the text input data or the recognized voice data is determined to be related to the domain corresponding to the second portion of data.
In an embodiment of the present invention, the voice recognition module further comprises at least one other-domain lexicon and at least one other-domain grammar database. The other-domain lexicon serves to determine a lexicon correlation between the voice input data and other domains. The other-domain grammar database serves to determine a grammar correlation between the text input data or the recognized voice data and other domains.
The present invention also discloses an integrated dialogue method for a bridge and a plurality of domains, wherein the bridge is coupled to each of the domains with bidirectional communications respectively. When a domain in the domains receives and recognizes an input data, this domain determines whether to process the input data or to transmit the input data to a second domain in the domains via the bridge.
In an embodiment of the present invention, after recognizing the input data, the method further determines whether to process the input data in the first domain, or to generate a dialogue result by processing the input data in the first domain and transmit the dialogue result to the second domain, or to transmit the input data to the second domain without processing the input data in the first domain.
In an embodiment of the present invention, the method further receives a local domain dialogue command and/or a dialogue parameter information, and generates dialogue history information by recognizing the input data. If only the local domain dialogue command is obtained after the input data is recognized, the first domain will generate a dialogue result according to the local domain dialogue command and/or the dialogue history information. If only the dialogue parameter information is obtained after the input data is recognized, the first domain will transmit the input data and/or the dialogue parameter information and/or the dialogue history information to the second domain via the bridge. If the local domain dialogue command and the dialogue history information are obtained together after the input data is recognized, the first domain transmits the input data to the second domain via the bridge according to a dialogue result received by the local domain dialogue command and/or the dialogue parameter information and/or the dialogue history information. If the first domain does not receive a dialogue command for the local domain and all other domains, the first domain will response an error signal.
The present invention further discloses an integrated dialogue system. The system comprises a hyper-domain, a plurality of domains and a bridge. The hyper-domain receives and recognizes an input data. The bridge is coupled to each of the domains with bidirectional communications respectively. After the hyper-domain recognizes the input data and determines that the input data is related to a first domain in the domains, the input data is transmitted to the first domain via the bridge. After the first domain processed the input data and generated a dialogue result, the dialogue result is transmitted back to the hyper-domain via the bridge.
In an embodiment of the present invention, after the dialogue result is received, the hyper-domain recognizes the input data and the dialogue result to be related to the second domain, and therefore transmits the input data and the dialogue result to the second domain via the bridge.
In an embodiment of the present invention, after receiving the dialogue result, the hyper-domain will output the dialogue result. The output is in a voice and/or a text form.
In an embodiment of the present invention, the hyper-domain comprises a hyper-domain database. Or at least one of the domains comprises a domain database.
In an embodiment of the present invention, the input data comprises a text input data or a voice input data.
In an embodiment of the present invention, the hyper-domain comprises a recognizer and a dialogue controller. The recognizer is coupled to the bridge with the bidirectional communication. The recognizer has a voice input to receive the voice input data, and/or a text input to receive the text input data. The recognizer recognizes whether the voice input data or the text input data relates to the first domain and transmits the input data to the first domain via the bridge and receives the dialogue result back from the first domain. The dialogue controller is coupled to the recognizer to receive and process the dialogue result.
In an embodiment of the present invention, the hyper-domain further comprises a text-to-speech synthesizer, a voice output and a text output. The text-to-speech synthesizer is coupled to the dialogue controller for receiving and transforming the text dialogue result into a voice dialogue result. The voice output is coupled to the text-to-speech synthesizer for sending out the voice dialogue result. The text output is coupled to an output for sending out the dialogue result.
In an embodiment of the present invention, the recognizer comprises a voice recognition module, a grammar recognition module and a domain selector. The voice recognition module is coupled to the voice input for receiving the voice input data and sending out a recognized voice data and a lexicon relationship. The grammar recognition module, coupled to the text input for receiving text input data and to the voice recognition module for receiving recognized voice data, generates recognized data and a grammar relationship. The domain selector is coupled to the grammar recognition module, the dialogue controller and the bridge for recognizing a domain related to the recognized data.
In an embodiment of the present invention, the voice recognition module further comprises an explicit domain transfer lexicon database and a plurality of other-domain lexicons. The explicit domain transfer lexicon database recognizes whether the voice input is correlated to a first portion of data in its database. If the recognition result is yes, this voice input data is determined to be related to the domain corresponding to the first portion of data. Each of the other-domain lexicons corresponds to each of the domains for recognizing the voice input data and gets a lexicon-relationship of each domain.
In an embodiment of the present invention, the grammar recognition module further comprises an explicit domain transfer grammar database and a plurality of other-domain grammar databases. When the text input data or the recognized voice data is correlated to a second portion of explicit domain transfer grammar database, the text input data or the recognized voice data is determined to be related to the domain corresponding to the second portion of data. Each of the other-domain grammar databases correspond to each of the domains for recognizing the text input data or the recognized voice data and grammar-relationship of the domains.
One or part or all of these and other features and advantages of the present invention will become readily apparent to those skilled in this art from the following description wherein there is shown and described one embodiment of this invention, simply by way of illustration of one of the modes best suited to carry out the invention. As it will be realized, the invention is capable of different embodiments, and its several details are capable of modifications in various, obvious aspects all without departing from the invention. Accordingly, the drawings and descriptions will be regarded as illustrative in nature and not as restrictive.
When any one of the domains 306a, 306b and 306c receives the input data, the domain recognizes the input data so as to determine whether to process the input data locally, or to process the input data to generate a dialogue result and transmit the dialogue result to a next domain, or to transmit the input data to a next domain without processing the input data.
For example, when the domain 306b in
As shown by operation 312 in
In this embodiment described above, the user can input another data, such as weather information after receiving the airline booking dialogue result. Or the user may input another data after receiving the hotel information dialogue result. The domain, which receives a further input information, combines the dialogue parameter information and the dialogue history information to generate a new dialogue command, for example, “Inquiry the weather information in New York City on July 4”. The dialogue parameter information and the dialogue history information are useful in determining if the following input data is related to the prior dialogue result, to determine which domain to process the following input data.
Assumed that the input data, “I want to book an airline ticket to New York City on July 4”, is entered into the airline booking domain. If only the local domain dialogue command “Book an airline ticket to New York City on July 4” is recognized and obtained, the domain will execute a dialogue to generate a dialogue result according to the local domain dialogue command.
Assumed that the input data, “I want to book an airline ticket to New York City on July 4”, is entered into the hotel domain. Then, after recognition, if only a dialogue parameter information, comprising the voice feature, the other-domain keyword “airline ticket”, and other-domain keyword, is recognized and obtained, the hotel domain transmits the input data and/or the dialogue parameter information and the dialogue history information to the second domain via the bridge 304.
In some embodiments of the present invention, if the input data “I want to book an airline ticket to New York City on July 4 and a hotel room over there” is entered into the domain related to the airline booking, thus the local domain dialogue command “Book an airline ticket to New York City on July 4” and the dialogue parameter information (e.g., related to hotel room) will be queried in one dialogue turn. Then, the domain will transmit the input data, the dialogue result from processing the local domain dialogue command, the dialogue parameter information and the dialogue history information to the second domain via the bridge 304. Once second domain completed this request, it will reply the dialogue results back via the bridge 304, and dialogue controller will combine all dialogue results, and report to user in one dialogue turn. If one domain sends data to other domain via the bridge, the sending domain will wait a timeout to get processed response from the specified domain. If the sending domain successfully got response from other domain before timeout, it will use received dialogue response to response user. Otherwise, the sending domain will report error message to user to notify needed domain is out of sync. Even that domain response after timeout, the sending domain will ignore it, but notify user that domain is alive again.
According to an embodiment of the present invention, if no local domain dialogue command and other-domain dialogue command is recognized and obtained, an error signal will be sent to the user.
According to an embodiment of the present invention, the user may enter the input data to the integrated dialogue system 302, for example, in a voice form or in a text form.
According to an embodiment of the present invention, each domain comprises a voice output, coupled to the control output 414 of the dialogue controller 410 via the text-to-speech synthesizer 406. The text-to-speech synthesizer 406 receives and transforms the dialogue results into a speech dialogue which is sent to the user in voice form via the voice output.
According to an embodiment of the present invention, the domain comprises a text output, coupled to the control output 414 of the dialogue controller 410. The text output sends out the dialogue results to the user in text.
According to an embodiment of the present invention, the voice recognition module 502 comprises a domain lexicon 512 related to the domain of the recognizer 402. The grammar recognition module 504 comprises a local domain grammar database 522 related to the domain of the recognizer 402. According to an embodiment of the present invention, the voice recognition module 502 comprises an explicit domain transfer lexicon database 514 and/or a plurality of other-domain lexicons 516a-516n. The grammar recognition module 504 comprises an explicit domain transfer grammar database 524 and/or a plurality of other-domain grammar databases 526a-526n. The explicit domain transfer lexicon database 514 comprises keywords for other domains, such as the weather domain comprising temperature or rain keywords.
Referring to
Referring to
The domain selector 506 is coupled to the grammar recognition module 504 for receiving recognized data. The domain selector 506 obtains the local domain dialogue command or the dialogue parameter information, such as the voice feature, the other-domain keyword, or the domain related to the other-domain keyword, and the dialogue history data based on the domain lexicon tags, the lexicon-relationship, the domain grammar tags and the grammar-relationship. Accordingly, if the domain 306b executes recognition, the local domain dialogue command “I want to book an airline ticket to New York City on July 4”; the other-domain keyword “hotel”; and the second domain 306c are recognized. The domain selector 506 is coupled to the dialogue controller 404 for sending out the local domain dialogue command to the dialogue controller 404. The domain selector 506 is coupled to the bridge 304 for sending out the input data, the search results, the dialogue parameter information and the dialogue history information to the bridge 304. If a domain got data from the bridge, i.e., speech waveform, feature, or text of recognized speech, and/or dialogue history, the received domain will use received data as same as domain input. e.g., recognition for input waveforms or NLU parses for input text of recognized speech. If the received data is recognized to process in received domain, it will use input data to process dialogue control and got dialogue response to get back to sender via bridge. If received domain recognized input data need to transmit to other domain, unless that domain is belong to senders that send this data, it will transmit data via the bridge to other domain to make that domain process the dialogue to get response. If such domain is belonging to senders, an error message is reported via the bridge.
The present invention also discloses an integrated dialogue method. The method is applied to an integrated dialogue system comprising a bridge and a plurality of domains. The bridge is coupled to each domain with a bilateral communication respectively. After a first domain in the domains receives and recognizes an input data, the first domain determines whether to process the input data or to transmit the input data to a second domain via the bridge.
In an embodiment of the present invention, after recognizing the input data, the method further determines whether to process the input data in the first domain, or to process the input data in the first domain to generate a dialogue result and transmit the dialogue result to the second domain, or to transmit the input data to the second domain without processing the input data in the first domain.
According to an embodiment of the present invention, the input data, and at least one of a local domain dialogue command and dialogue parameter information is recognized and obtained and a dialogue history information is generated. If only the local domain dialogue command is obtained, the first domain generates a dialogue result according to the local domain dialogue command and/or the dialogue history information. If only the dialogue parameter information is obtained, the first domain transmits the input data and/or the dialogue parameter information and/or the dialogue history information to the second domain via the bridge. If the local domain dialogue command and the dialogue history information are obtained together, the first domain transmits the input data to the second domain via the bridge according to a dialogue result received by the local domain dialogue command and/or the dialogue parameter information and/or the dialogue history information. If the first domain cannot receive the local domain dialogue command and an other-domain dialogue command, the first domain sends out an error signal.
According to an embodiment of the present invention, after generating the dialogue result by processing the input data, the dialogue result is a voice or text output to the user. The steps of the method are described with reference to
Accordingly, in the present invention, the domains can be set up separately. The bridge is then coupled to the domains for constituting the integrated dialogue system. Each of the domains of the present invention can be separately designed without affecting designs of other domains. Moreover, any new domain, if necessary, can be added to the integrated dialogue system. The integrated dialogue system integrates different domains by using the bridge for different applications. Accordingly, different applications are built on different domains; none of the same applications are going to be built on different domains. The structure of the system is, therefore, relatively simple, and the cost effective. Moreover, when any of the domains is failed, the dialogue can start from other domains, and other domains can still execute dialogues without affecting the operation of the whole integrated dialogue system. By using the bridge, all of the domains share information with each other. In addition, the dialogue parameter information and the dialogue history information reserve the prior command input from the user without repeating the same command. The domain lexicon tags and weights, and the domain grammar tags and weights are added to the recognized voice data and the recognized data for accelerating the precise recognition of the local domain dialogue command and the dialogue parameter information by using the domain selector.
Referring to
After receiving the first domain dialogue command, the first domain 612b makes a dialogue with the first domain database 614b to generate a first dialogue result, e.g., “An airline booking to New York City on July 4”, which is then transmitted to the hyper-domain 604.
After receiving the dialogue result, the hyper-domain 604 generates a second domain dialogue command and the second domain corresponding to the second domain dialogue command. For example, the dialogue result “An airline booking to New York City on July 4” and the input data “I want to book an airline ticket to New York City on July 4 and a hotel room” are processed so as to generate the second domain command “Book a hotel room at New York City on July 4”. The bridge 608 then transmits the second domain command to the second domain for dialogue.
In the integrated dialogue system, if the first domain command can not be generated by recognizing the input data, an error signal is output.
According to an embodiment of the present invention, a user enters the input data to the integrated dialogue system by entering voice input data or text input data.
The voice recognition module 802 comprises an explicit domain transfer lexicon database 814 and/or a plurality of other-domain lexicons 816a-816n. The grammar recognition module 804 comprises an explicit domain transfer grammar database 824 and/or a plurality of other-domain grammar databases 826a-826n. The explicit domain transfer lexicon database 814 comprises keywords for all domains.
Compared with the integrated dialogue system in
Accordingly, the present invention separately sets up the databases for the domains. A hyper-domain and a bridge are coupled to all domains so as to constitute an integrated dialogue system. Every domain can be separately designed without affecting other domains. Any new domain can be optionally added to the integrated dialogue system anytime. The integrated dialogue system integrates different domains by using the hyper-domain and the bridge for different applications. Different applications are built on different domains; none of the same applications are going to be built on the different domains. The dialogue controller collects the dialogue conditions and restricts the searching scope of the dialogue for multiple dialogues. The hyper-domain integrates information of the domains for different applications. The input data from the user can be more precisely recognized and transmitted to a proper domain.
The foregoing description of the embodiment of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form or to exemplary embodiments disclosed. Accordingly, the foregoing description should be regarded as illustrative rather than restrictive. Obviously, many modifications and variations will be apparent to practitioners skilled in this art. The embodiments are chosen and described in order to best explain the principles of the invention and its best mode practical application, thereby to enable persons skilled in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use or implementation contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents in which all terms are meant in their broadest reasonable sense unless otherwise indicated. It should be appreciated that variations may be made in the embodiments described by persons skilled in the art without departing from the scope of the present invention as defined by the following claims. Moreover, no element and component in the present disclosure is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
93118735 | Jun 2004 | TW | national |