The present invention relates to a dialog management system and a dialog management method for performing a dialog based on an input natural language to thereby execute a command matched to a user's intention.
In recent years, attention has been paid to a method in which a language spoken by a person is inputted by speech, and using its recognition result, an operation is executed. This technology, which is applied to in speech interface in mobile phones and car-navigation systems, is that in which, as a basic method, an estimated speech recognition result has been associated with an operation beforehand by a system and the operation is executed when a speech recognition result is the estimated one. According to this method, in comparison with the conventional manual operation, an operation can be directly executed through phonetic utterance, and thus, this method serves effectively as a short-cut function. At the same time, the user is required to speak a language that the system is waiting for in order to execute the operation, so that, as the functions to be addressed by the system increase, the languages having to be kept in mind increase. Further, among the users, a few of them use the system after fully understanding its operation manual, and accordingly, the users generally do not understand how to talk what language for an operation, thus causing a problem that, actually, they cannot make an operation other than that of the function kept in their mind, through speech.
In this respect, as conventional arts having been improved in that matter, and as methods for accomplishing a purpose even if the user does not keep in mind a command for accomplishing the purpose, there are disclosed methods in which a system interactively induce so that the purpose is led to be accomplished. As one of the methods for accomplishment, there is a method in which a dialog scenario has been beforehand created in a tree structure, and a tracing is made from the root of the tree structure through intermediate nodes (hereinafter, “transition occurs on the tree structure” is expressed as “node is activated”), so that, at the time of reaching a terminal node, the user accomplishes the purpose. What route to be traced in the tree structure of the dialog scenario is determined based on a keyword held at each node in the tree structure and depending on what keyword is included during speaking of the user for a transition destination of a currently-activated intention.
Furthermore, according to a technology described, for example, in Patent Document 1, there is provided a plurality of such scenarios and the scenarios each hold a plurality of keywords by which that scenario is characterized, so that it is determined what scenario is to be selected for promoting dialog, based on an initial utterance of the user. Further, there is disclosed a method of changing the subject of conversation, that selects, when no uttered content by the user is matched to the transition destination in a tree structure related to a currently-proceeding scenario, another scenario on the basis of the plurality of keywords given to the plurality of scenarios, followed by promoting dialog from the root.
The conventional dialog management systems are configured as described above, and thus allow to select a new scenario if the transition is unable. However, for example, in the case where an expression in a scenario in a tree structure created based on a function in design of the system is different to an expression that represents the function and is expected by the user, and thus, during dialog using the scenario in a tree structure after selection of the scenario, when a content uttered by the user is out of that expected by the scenario, this results in, on the assumption that there is possibly another scenario, selection of another scenario that is probable from the uttered content. If the uttered content is ambiguous, the scenario in progress is preferentially selected, so that there is a problem that even if another scenario is more probable, transition is not made thereto. Further, according to the conventional methods, it is unable to actively change the scenario itself, and thus, there is a problem that, when a scenario in a tree structure created based on a function in design of the system is different to a functional structure expected by the user, or when the user misunderstands the function, it is unable to customize the scenario in a tree structure.
This invention has been made to solve the problems as described above, and an object thereof is to provide a dialog control system that can perform an appropriate transition even for an unexpected input, to thereby execute an appropriate command.
A dialog management system according to the invention comprises: an intention estimation processor that, based on data provided by converting an input in a natural language into a morpheme string, estimates an intention of the input; an intention estimated-weight determination processor that, based on data in which intentions are arranged in a hierarchical structure and based on the intention thereamong being activated at a given object time, determines an intention estimated weight of the intention estimated by the intention estimation processor; a transition node determination processor that determines an intention to be newly activated through transition, after correcting an estimation result by the intention estimation processor according to the intention estimated weight determined by the intention estimated-weight determination processor; a dialog turn generator that generates a turn of dialog from one or plural intentions activated by the transition node determination processor; and a dialog manager that, when a new input in the natural language is provided due to the turn of dialog generated by the dialog turn generator, controls at least one process among processes performed by the intention estimation processor, the intention estimated-weight determination processor, the transition node determination processor and the dialog turn generator, followed by repeating that controlling, to thereby finally execute a setup command.
The dialog management system of the invention is configured to determine the intention estimated weight of the estimated intention, to thereby determine an intention to be newly activated through transition, after correcting the intention estimation result according to the intention estimated weight. Thus, even for an unexpected input, an appropriate transition is performed and thus an appropriate command can be executed.
Hereinafter, for illustrating the invention in more detail, embodiments for carrying out the invention will be described according to the accompanying drawings.
The dialog management system shown in
The speech input unit 1 is an input unit in the dialog management system that receives an input by speech. The dialog management unit 2 is a management unit that controls the speech recognizer 4 to the speech synthesizer 14 so as to promote dialog and thereby to finally execute a command allocated to an intention. The speech output unit 3 is an output unit in the dialog management system that performs outputting by speech. The speech recognizer 4 is a processing unit that recognizes the speech inputted through the speech input unit 1 and converts it into a text. The morphological analyzer 5 is a processing unit that divides a recognition result from recognition by the speech recognizer 4 into morphemes. The intention estimation model 6 is data of an intention estimation model for estimating an intention using a morphological analysis result from analysis by the morphological analyzer 5. The intention estimation processor 7 is a processing unit that inputs the morphological analysis result from analysis by the morphological analyzer 5 and uses the intention estimation model 6, to thereby output an intention estimation result. The intention estimation processor outputs a set of an intention and a score indicative of probability of that intention, in a form of a list.
An intention is represented, for example, in such a form of “<main intention> [<slot name>=<Slot value> . . . ]”. In a specific example, it may be represented as “Destination Point Setting [Facility=?]”, “Destination Point Setting [Facility=$Facility$ (=‘oo’ Ramen)]”, or the like [a specific POI (Point Of Interest) in Japanese is entered into ‘oo’]. This “Destination Setting [Facility=?]” means a state where a destination point is wanted to be set but a specific facility name is not yet determined, and “Destination Point Setting [Facility=$Facility$ (=‘oo’ Ramen)]” means a state where a specific facility of “‘oo’ Ramen” is wanted to be set as destination point.
Here, as an intention estimating method by the intention estimation processor 7, a method such as a maximum entropy method, for example, may be utilized. Specifically, such a method may be utilized in which: with respect to a speech of “Want to set a destination point”, from its morphological analysis result, independent words (hereinafter, each referred to as a feature) of “destination point, set” have been extracted and then placed in a set with its correct intention of “Destination Point Setting [Facility=?]”; likewise, a number of sets of features and their intentions have been collected; and from these sets, it is estimated, using a statistical approach, which intention is probable to what extent for input features in the list. In the following, description will be made assuming that the intention estimation is performed utilizing a maximum entropy method.
The intention hierarchical graphic data 8 is data in which intentions are represented in a hierarchical manner. For example, with respect to such two intentions represented by “Destination Point Setting [Facility=?]” and “[Facility=$Facility$ (=‘oo’ Ramen)]”, the more abstract intention of “Destination Point Setting [Facility=?]” is placed at a hierarchically upper level, and “[Facility=$Facility$ (=‘oo’ Ramen)]” in which its specific slot is filled, is placed thereunder. Further, there is held therein information about what is the currently-activated intention having been estimated by the dialog management unit 2.
The intention estimated-weight determination processor 9 is a processing unit that determines, from the intention hierarchical information in the intention hierarchical graphic data 8 and the information about the activated intention, a weight to be given for a score of the intention estimated by the intention estimation processor 7. The transition node determination processor 10 is a processing unit that makes re-evaluation about the list of the intention estimated by the intention estimation processor 7 and the score of that intention, using the weight determined by the intention estimated-weight determination processor 9, to thereby select an intention (including also a case of plural intentions) to be activated next.
The dialog scenario data 11 is data of a dialog scenario in which written is information about what is to be executed from one or plural intentions selected by the transition node determination processor 10. Meanwhile, the dialog history data 12 is data of a dialog history in which a state of each dialog is stored. In the dialog history data 12, there is held information for changing an operation according to a state just before that changing and for returning to a state just before a confirmatory dialog was made, when the user denies confirmation or likewise. The dialog turn generator 13 is a dialog turn generator that inputs one or plural intentions selected by the transition node determination processor 10, and utilizes the dialog scenario data 11 and the dialog history data 12, to thereby generate a scenario for generating a system response, for determining an operation to be executed, for waiting for a next input from the user, or the like. The speech synthesizer 14 is a processing unit that inputs a system response generated by the dialog turn generator 13 to thereby generate a synthesized speech.
Next, operations of the dialog management system of Embodiment 1 will be described. In this embodiment, operations will be described as follows assuming that an input (input by way of one or plural keywords or a sentence) is a speech in a natural language. Further, the invention is irrelevant to a speech-related false recognition, so that, hereinafter, the description will be made assuming that the user's utterance is properly recognized without a false recognition. In Embodiment 1, it is assumed that dialog is started by use of a speech start button that is not explicitly shown here. Further, before dialog is started, every intention node in the intention hierarchical graph in
When the user pushes the utterance start button, dialog is allowed to start, so that the system outputs a system response for promoting starting of dialog and a beep sound. For example, when the utterance start button is pushed, a system response with the system response 31 of “Please talk after beep” is given, and then, with the sounding of a beep, the speech recognizer 4 is placed in a recognizable state. When processing moved to Step ST11, if the user speaks the utterance 32 of “Want to make change of route”, its speech is inputted through the speech input unit 1 and converted into a text by the speech recognizer 4. Here, the speech is assumed to be properly recognized. After completion of the speech recognition, processing moves to Step ST12, so that “Want to make change of route” is transferred to the morphological analyzer 5. The morphological analyzer 5 analyses the recognition result so as to perform morphological analysis in such a manner to provide [“route”/noun, “of”/postpositional particle, “change”/noun (to be connected to the verb “suru” in Japanese), “make”/verb, and “want to”/auxiliary verb in Japanese].
Subsequently, processing moves to Step ST13, so that the result from the morphological analysis is transferred to the intention estimation processor 7 and then intention estimation is performed using the intention estimation model 6. In the intention estimation processor 7, the features to be used for intention estimation are extracted from the morphological analysis result. Firstly, in Step ST13, the features of “Route, Set” are extracted in a form of a list from the morphological analysis result with respect to the recognition result in the case of the utterance 32, and intention estimation is performed based on these features by the intention estimation processor 7. On this occasion, the result of intention estimation is given as the intention estimation result 52, so that there is provided an intention of “Route Selection [Type=?]” with a score of 0.972 (actually, scores are also allocated to the other intentions).
When the intention estimation result is provided, processing moves to Step ST14, so that a set of the intention estimated by the intention estimation processor 7 and its score in a form of a list, is transferred to the transition node determination processor 10 and subjected to correction of the score, and then processing moves to Step ST15, so that a transition node to be activated is determined. For the correction of the score, such a formula with a form of, for example, the score correction formula 51 is used. In the formula, represented by i is an intention, and represented by Si is a score of the intention i. The function I(Si) is defined as a function that returns 1.0 when the intention i falls within an intention-preferentially-estimated region that is placed at a hierarchically lower level of an activated intention, and returns α (0≦α≦1) when it is out of the intention-preferentially-estimated region. Note that in Embodiment 1, α=0.01 is given. Namely, if the intention is unable to be transited from an activated intention, its score is lowered to be corrected so that the sum of the scores becomes 1. In a situation just after the speech “Want to make change of route” was made, every node in the intention hierarchical graph is not placed in an activated state. Thus, every score is divided by the sum of all of intention scores having been multiplied by 0.01, so that the score after correction becomes the original score, after all.
Then, in Step ST15, a set of intentions to be activated is determined by the transition node determination processor 10. Examples of an intention-node determination method to be operated by the transition node determination processor 10 include those as follows:
(a) If there is a maximum score of 0.6 or more, only one node with the maximum score is activated;
(b) If there is a maximum score of less than 0.6, plural nodes with a score of 0.1 or more are activated; and
(c) If there is a maximum score of less than 0.1, no activation is made assuming that any intention could not be understood.
In the case of Embodiment 1, in a situation where the utterance of “Want to make change of route” is made, the maximum score becomes 0.972, so that only the intention of “Route Selection [Type=?]” is activated by the transition node determination processor 10.
When the intention node 28 is activated by the transition node determination processor 10, processing moves to Step ST16, so that a processing list for the next turn is generated by the dialog turn generator 13 on the basis of the contents written in the dialog scenario data 11. Specifically, this follows the process flow shown in
One dialog turn is completed at the time the speech-input waiting state is provided, and then, processing is continued by the dialog management unit 2. Thereafter, the flow in
In light of the fact that there is an activated intention node having been transited but no link from the transition source, the dialog turn generator 13 generates a dialog turn. Because of shifting to a node with no transition link, the generation is executed in a confirmed way. Firstly, when the dialog scenario is selected, a pre-execution prompt of “Will search $Genre$ near the current place” is selected, and then, from the information “$Genre$ (=Ramen restaurant)” of the intention estimation result, “$Genre$” is replaced with “Ramen restaurant”, so that there is generated “Will search ramen restaurant near the current place”. Further, a confirmatory response is added, so that “Will search ramen restaurant near the current place. Are you alright?” is determined as the system response. Then, since no command is defined, with assumption that dialog continues, there is provided a user-input waiting state.
Here, if the user makes a speech as shown by the user's speech 36 of “Yes”, a confirmatory special intention of “Confirmation [Value=YES]” is generated by the speech recognizer 4, the morphological analyzer 5 and the intention estimation processor 7. For the process by the transition node determination processor 10, the effective special intention 82 of “Confirmation [Value=YES]” is selected, so that the transition to the intention node 25 is ascertained (shown by the transition link 42). Note that, if the user makes an unfavorable speech, such as “No”, a special intention of “Confirmation [Value=NO]” is estimated as an intention estimation result with a high score by the intention estimation processor 7. Since the special intention 83 of “Confirmation [Value=NO]” is effective for the process by the transition node determination processor 10, based on the dialog history data 12 shown in
Then, after the state of the intention node 25 is ascertained, at the dialog turn generator 13 and using the dialog scenario 67, “$Genre$” in a post-execution prompt of “$Genre$ near the current place was searched” is replaced with “Ramen restaurant” to thereby generate a system dialog response of “Ramen restaurant near the current place was searched”. Then, since there is a DB search condition in the dialog scenario 67, the DB search of “SearchDB (Current place, Ramen restaurant)” is added to the dialog scenario so as to be executed, and upon receiving the execution result, “Please select from the list” is added as a system response to the dialog turn, and then processing moves to the next one (in
The dialog management unit 2 outputs by speech the system response 37 of “Ramen restaurant near the current place was searched. Please select from the list” according to the received dialog turn, and displays the list of DB-searched ramen restaurants, and is then placed in a user's speech waiting state. When the user's utterance 38 of “Stop by ‘oo’ Ramen” is uttered by the user and it is properly speech-recognized, morphologically analyzed and understood in intention, an intention of “Route-point Setting [Facility=$Facility$]” is intention-estimated. Since this intention of “Route-point Setting [Facility $Facility$]” is at a level lower than the intention node 25, so that a transition to the intention node 26 is executed.
As the result, the dialog scenario 63 for the intention node 26 of “Route-point Setting [Facility=$Facility$]” is selected, and a command of “Add (Route point, ‘oo’ Ramen)” is added to the dialog turn. Subsequently, the system response 39 of “‘oo’ Ramen was set to the route point” is added to the dialog turn (in
Lastly, the dialog management unit 2 executes the received dialog turn, sequentially. Namely, it executes adding of the route point and further, outputting of “‘oo’ Ramen was set as route point” using a synthesized speech. In the dialog turn, a command execution is included, so that after the termination of the dialog, the management unit returns to the initial utterance-start waiting state.
As described above, according to the dialog management system of Embodiment 1, it comprises: an intention estimation processor that, based on data provided by converting an input in a natural language into a morpheme string, estimates an intention of the input; an intention estimated-weight determination processor that, based on data in which intentions are arranged in a hierarchical structure and based on the intention thereamong being activated at a given object time, determines an intention estimated weight of the intention estimated by the intention estimation processor; a transition node determination processor that determines an intention to be newly activated through transition, after correcting an estimation result by the intention estimation processor according to the intention estimated weight determined by the intention estimated-weight determination processor; a dialog turn generator that generates a turn of dialog from one or plural intentions activated by the transition node determination processor; and a dialog management unit that, when a new input in the natural language is provided due to the turn of dialog generated by the dialog turn generator, controls at least one process among processes performed by the intention estimation processor, the intention estimated-weight determination processor, the transition node determination processor and the dialog turn generator, followed by repeating that controlling, to thereby finally execute a setup command. Thus, even for an unexpected input, an appropriate transition is performed and thus processing matched to the user's request can be carried out.
Further, according to the dialog management method of Embodiment 1, it uses a dialog management system that estimates an intention of an input in a natural language to perform dialog and, as a result, to execute a setup command, and comprises: an intention estimation step of estimating the intention of the input, based on data provided by converting the input in the natural language into a morpheme string; an intention estimated-weight determination step of determining, based on data in which intentions are arranged in a hierarchical structure and based on the intention thereamong being activated at a given object time, an intention estimated weight of the intention estimated in the intention estimation step; a transition node determination step of determining an intention to be newly activated through transition, after correcting an estimation result in the intention estimation step according to the intention estimated weight determined in the intention estimated-weight determination step; a dialog turn generation step of generating a turn of dialog from one or plural intentions activated in the transition node determination step; and a dialog control step of controlling, when a new input in the natural language is provided due to the turn of dialog generated in the dialog turn generation step, at least one step among the intention estimation step, the intention estimated-weight determination step, the transition node determination step and the dialog turn generation step, followed by repeating that controlling, to thereby finally execute a setup command. Thus, even for an unexpected input, an appropriate transition is performed and thus processing matched to the user's request can be carried out.
A command history data 15 is data in which each command having been executed so far is stored with its execution time. Further, a history considered dialog turn generator 16 is a processing unit that generates a dialog turn by use of the command history data 15, in addition to having the functions of the dialog turn generator 13 in Embodiment 1 that uses the dialog scenario data 11 and the dialog history data 12.
Next, operations of the dialog management system of Embodiment 2 will be described. Although the operations in Embodiment 2 are basically the same as those in Embodiment 1, there is a difference from Embodiment 1 in that the operation of the dialog turn generator 13 is replaced with the operation of the history-considered dialog turn generator 16 that operates additionally with the command history data 15. Namely, the difference from Embodiment 1 resides in that when, with respect to a system response, a possibly-misunderstood intention is finally selected as an intention with a command definition, a scenario to be carried out is not directly generated, but a dialog turn for making confirmation is generated.
The dialog in Embodiment 2 shows a case where the user not well-understanding the application has added a registration point with his/her intention of setting a destination point, and thereafter, becomes aware of that fact and sets again the place as the destination point. The entire flow of the dialog is similar to in Embodiment 1 and thus follows the flow in
In the following, description will be made according to the contents of the dialog in
Because what is activated are the intention nodes 27 and 86, the dialog scenario 68 is selected, and “‘ox’ Station is set as destination point or registration point?” is added as a system response to the scenario (in
Firstly, in Step ST31, it is determined whether the number of intentions just before command execution is 0 or 1. Here, the intentions just before command execution are two intentions of [Registration Point Setting [Facility=$Facility$ (=‘ox’ Station)]” and [Destination Point Setting [Facility $Facility$ (=‘ox’ Station)]”, so that the flow moves to Step ST34. In Step ST34, [Registration Point Setting [Facility=$Facility$ (=‘ox’ Station)]” and [Destination Point Setting [Facility=$Facility$ (=‘ox’ Station)]” are determined as selectable intentions. Then, in Step ST36, a command execution history 131 is added to the command execution history list. Furthermore, in Step ST37, the selectable intentions are to be registered in the possibly misunderstood command list 15b when, among them, the intention other than the intention having been subjected to execution is thereafter subjected to execution within a specified time period; however, at the time the command execution history 131 is registered, a command execution history 132 is not present, so that the flow terminates with nothing to do.
Then, after a while, because the route guidance toward the “‘ox’ Station”, that the user believes to have set, is not initiated, the user becomes aware that what he/she has wanted to do is not going well. Thus, dialog is newly started. Here, if the user utters “Want to go to ‘ox’ Station” as indicated by the user's utterance 106, the intention estimation result 124 is obtained, resulting in setting of the destination point. Then, processing moves to Step St31, and because of no intention just before, further moves to Step ST32. Because of Step ST32 and absence of the intention itself just before, processing moves to Step ST33, and further to Step ST36, so that the command execution history 132 is registered.
After the command execution history is registered, in Step ST37, if, among the selectable intentions with ambiguities, the intention other than the intention having been selected is thereafter selected within a specified time period (for example, 10 minutes), processing moves to Step ST38, so that, assuming that it is possibly due to the user's misunderstanding, the intentions are registered in the possibly misunderstood command list 15b. Judging from the command execution histories 131, 132, there is a possibility that a destination point setting is misunderstood as a registration point setting, so that a command misunderstanding possibility 133 is added and the number of confirmations and the number of correct-intention executions are provided as 1 each.
Let's assume that, at a later date, the user makes the same misunderstanding when going to set a destination point. When, for example, the user speaks the user's utterance 110 of “‘ΔΔ’ Center” [a specific POI (Point Of Interest) in Japanese is entered into ‘ΔΔ’], its intention is understood similarly like the initial speech, so that the system response 111 of “‘ΔΔ’ Center is set as destination point or registration point?” is generated, to thereby wait for a user's utterance. If the user makes an utterance erroneously like before as the user's utterance 112 of “Registration point”, the intention estimation result becomes “[Registration Point Setting [Facility $Facility$ (=‘ΔΔ’ Center)]”. Thus, in the history-considered dialog turn generator 16, processing moves to Step ST41, and then, because the data of “Registration Point Setting [Facility=$Facility$]” is present in the possibly misunderstood command list 15b, processing moves to Step ST42. In Step ST42, the system response 113 for promoting confirmation of “Will set ‘ΔΔ’ Center as registration point, not as destination point. Are you alright?” is generated. Then, processing moves to Step ST43 and, after adding 1 to the number of confirmations, processing terminates. Meanwhile, in Step ST41, if the execution-planned intention is not present in the possibly misunderstood command list 15b, processing moves to Step ST44, so that the execution-planned intention is subjected to execution.
After outputting the system response 113, the dialog management unit 2 waits for a user's utterance, and when the user's response 114 of “Oh, Mistake, Set as destination point” is made, “Destination Point Setting [Facility=$Facility$ (=‘ΔΔ’ Center)]” is selected and is subjected to execution.
Thereafter, as the user understands the difference between “Registration point” and “Destination point”, a destination point will be set without use of the languages “Registration point”, so that the number of correct-intention executions is increased without increasing the number of confirmations. Namely, there will be no case where, among the possibly misunderstood intentions being present in the possibly misunderstood command list 15b, an intention that has not been subjected to execution is subjected to execution within a specified time period.
By deleting the data in the possibly misunderstood command list to quit confirmation at the time the number of correct-intention executions/the number of confirmations exceeds, for example, 2, it is possible to promote dialog smoothly.
As described above, according to the dialog management system of Embodiment 2, it comprises: instead of the dialog turn generator, a history-considered dialog turn generator that generates a turn of dialog from one or plural intentions activated by the transition node determination processor, and that records each command having been executed as a result by the dialog, to thereby generate a turn of dialog using a list in which selectable intentions in a history of executed commands are registered when among the intentions, the intention other than the intention having been subjected to execution is thereafter subjected to execution within a specified time period. Thus, even if there is a possibility of misunderstanding on a command by the user, an appropriate transition can be performed, to thereby execute an appropriate command.
Further, according to the dialog management system of Embodiment 2, when, among the selectable intentions in the history of executed commands, the intention other than the intention having been subjected to execution is thereafter subjected to execution within a specified time period, the history-considered dialog turn generator generates a turn of dialog for making confirmation; and, after generation of said turn of dialog, when, among the selectable intentions being present in the list, the intention other than the intention having been subjected to execution is not subjected to execution within a predetermined time period, and this condition is repeated a setup number of times, the history-considered dialog turn generator deletes the list and stops generation of said turn of dialog for making confirmation. Thus, when the user does not understand a proper command, it is possible to take an appropriate measure for dealing therewith. Meanwhile, when the user has understood a proper command, it is possible to prevent from making useless confirmation, or likewise.
Next, operations of the dialog management system of Embodiment 3 will be described.
The initial dialog in Embodiment 3 includes the dialog contents in
Let's assume that the dialog in
Here, since there is the additional transition link 201, calculation on transition intention is made with assumption of the presence of the transition link 42, so that the intention estimation results 194, 195 are obtained. The transition node determination processor 10 activates only the intention node 25 as a transition node. The dialog turn generator 13, since it prosecutes processing with assumption of the presence of the transition link 42, adds the system response 175 to the scenario without making confirmation to the user, and then, shifts processing to the dialog management unit 2. The dialog management unit 2 promotes dialog thereby to output the system response 175 and then, based on the user's speech 176, to make transition to the intention node 26 with “Route Point Setting [Facility=$Facility$ (=‘x□’ Kalbi)]” [a specific POI (Point Of Interest) in Japanese is entered into ‘x□’]. As the result, the dialog scenario 63 is selected and, because of the presence of a command therefor, the command is executed, so that processing terminates; however, because of the presence of the transition link 42 in transition of the dialog, 1 is added to the number of transitions of the additional transition link 201.
When the number of transitions of the additional transition link 201 is updated, according to the flow in
Let's further assume that, in another time, the other subsequent dialog in
When the data of the additional transition link is added, according to the flow in
When the transition destination is thus replaced, this results in that the intention transition destination of the additional transition link 203 is changed to the intention node 211 in
As described above, according to the dialog management system of Embodiment 3, it includes a transition controller that, when the intention determined by the transition node determination processor is associated with a transition to an unexpected intention out of a link defined by the hierarchical intentions, adds information of a link from a corresponding transition source to a corresponding transition destination; wherein the transition node determination processor treats the link added by the transition controller similarly like a normal link, to thereby determine the intention. Thus, it is possible to perform an appropriate transition even for an unexpected input, to thereby execute an appropriate command.
Further, according to the dialog management system of Embodiment 3, when there is a plurality of transitions to the unexpected intentions and the plurality of unexpected intentions has a common intention, as a parent node, the transition controller replaces the transition to the unexpected intention with a transition to the parent node.
Thus, it is possible to execute a desired command with reduced dialog.
Note that in Embodiments 1 to 3, although the description has been made using Japanese language, it can be applied to the cases of a variety of languages in English, German, Chinese and the like, by changing the extraction method of the feature related to intention estimation for each of the respective languages.
Further, in the case of the language whose word is partitioned by a specific symbol (a space, etc.), when its linguistic structure is difficult to be analyzed, it is also allowable to take such a manner that a natural language text as an output is subjected to extraction processing of $Facility$, $Residence$ and the like, using a pattern matching or like method, and thereafter, intention estimation processing is directly executed.
Furthermore, in Embodiments 1 to 3, the description has been made assuming that the input is a speech input; however, even in the case of a text input using an input means, such as a keyboard, without using speech recognition as an input method, a similar effect can be expected.
Furthermore, in Embodiments 1 to 3, intention estimation has been performed by subjecting a text, as a speech recognition result, to processing by the morphological analyzer; however, in the case where a result by the speech recognition engine includes itself a morphological analysis result, intention estimation can be performed directly using its information.
Furthermore, in Embodiments 1 to 3, although the description about a method of intention estimation has been made using an example to which a learning model by a maximum entropy method is assumed to be applied, the method of intention estimation is not limited thereto.
It should be noted that unlimited combination of the respective embodiments, modification of any element in the embodiments and omission of any element in the embodiments may be made in the present invention without departing from the scope of the invention.
As described above, the dialog management system and the dialog management method according to the invention relate to such a configuration in which a plurality of dialog scenarios each constituted in a tree structure is prepared beforehand and transition is performed from a given one of the scenarios in a tree structure to another one of the scenarios in a tree structure, on the basis of dialog with the user; and are suited to be used as/for a speech interface in a mobile phone or a car-navigation system.
1: speech input unit, 2: dialog management unit, 3: speech output unit, 4: speech recognizer, 5: morphological analyzer, 6: intention estimation model, 7: intention estimation processor, 8: intention hierarchical graphic data, 9: intention estimated-weight determination processor, 10: transition node determination processor, 11: dialog scenario data, 12: dialog history data, 13: dialog turn generator, 14: speech synthesizer, 15: command history data, 16: history-considered dialog turn generator, 17: additional transition-link data, 18: transition link controller.
Number | Date | Country | Kind |
---|---|---|---|
2013-242944 | Nov 2013 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2014/070768 | 8/6/2014 | WO | 00 |