This application claims the benefit of a Japanese Patent Application No. 2004-185249 filed Jun. 23, 2004, in the Japanese Patent Office, the disclosure of which is hereby incorporated by reference.
1. Field of the Invention
The present invention generally relates to information input methods and apparatuses, and more particularly to an information input method and an information input apparatus which input both certain information and uncertain information, where input contents are certain (or definite) and may be uniquely determined in the case of the certain information and the input contents are uncertain (or indefinite) and may not be uniquely determined in the case of the uncertain information. The uncertain information may be treated as probability information.
2. Description of the Related Art
A call center system accepts calls from users at a call center. The calls from the users include inquiries, claims, orders and the like related to products or items. An operator of the call center manually inputs information using a keyboard, mouse and the like. In addition, it is conceivable to subject speeches of the user and the operator to a speech recognition, so as to input a speech recognition result to the call center system.
Contents of the information input from the keyboard, mouse and the like are certain (or definite) and may be uniquely determined. Such information will be referred to as “certain information” in this application. On the other hand, in the case of the speech recognition, the speech recognition result may be in error or, only a portion of the speech may be recognized by the speech recognition. For this reason, contents of the information input based on the speech recognition result are uncertain (or indefinite) and may not be uniquely determined. Such information will be referred to as “probability information” in this application. The probability information is of course not limited to the information based on the speech recognition result, and may include any uncertain information, such as information based on an image recognition result and information based on character recognition result (or optical character reader (OCR) recognition result).
A Japanese Laid-Open Patent Application No. 10-322450 proposes subjecting a user's speech to a speech recognition and displaying a speech recognition result, so that an operator may read back (or repeat) the user's speech. The operator's speech that is made by reading back the user's speech is also subjected to a speech recognition. Of the speech recognition result of the user's speech and the speech recognition result of the operator's speech, the speech recognition result with a higher recognition rate is selectively output as a final speech recognition result, and is used as an input to a system.
A Japanese Laid-Open Patent Application No. 2003-316374 proposes including, in annotation data, a specified speaker data that is obtained by subjecting a speech of a specified speaker at a receiving end to a speech recognition, an unspecified speaker data that is obtained by subjecting a speech of an unspecified speaker at a sending end to a speech recognition, and a keyboard data that is input by the specified speaker simultaneously as the call. Further, the specified speaker repeats the speech of the unspecified speaker, so as to facilitate the speech recognition.
However, the certain information input from the keyboard, mouse and the like, and the probability information obtained through the speech recognition and the like have the following problems.
It takes time to input the certain information from the keyboard, mouse and the like. The keyboard input takes time because all words and the like must be input without error, and also because it requires operator's concentration. In a case where the operator of the call center makes the keyboard input while speaking with the user, the operator may not be able to concentrate on both the keyboard input and the conversation. If the operator cannot concentrate on the keyboard input, an erroneous keyboard input is easily made. If the operator cannot concentrate on the conversation, an erroneous keyboard input may be made based on an erroneous understanding of the conversation contents. Moreover, if the operator decides to concentrate on the conversation and make the keyboard input later, the operator may forget to make the necessary keyboard input afterwards.
On the other hand, the probability information is uncertain or indefinite, because it is obtained through the speech recognition and the like which may inevitably include a recognition error. The speech recognition basically selects one of candidate words which are registered in advance, which most closely resembles the sound of the word that is obtained as the speech recognition result. For this reason, a large number of candidate words need to be registered, and the speech recognition is difficult in that there is a possibility of not selecting the correct candidate word. The speech recognition rate (or the degree of speech recognition certainty) has improved over the years, but it is still impossible to make the speech recognition without the speech recognition error. These problems of the speech recognition similarly occur in the image recognition and the character (or OCR) recognition.
Therefore, in the case of the call center system, for example, it takes time if the certain information is manually input by the operator from the keyboard, mouse and the like. The speech recognition selects only one of the candidate words having the highest recognition rate (or the degree of speech recognition certainty), and the selected candidate word is used as the probability information. However, since the recognition rate of the speech recognition is not 100%, the candidate word having the highest recognition rate is not necessarily the correct word, and the accuracy of the probability information may be low.
In addition, in the case of the speech recognition, if the number of registered candidate words increases, the recognition rate correspondingly decreases. Hence, in the case of the call center system, the decrease in the recognition rate results in the increase in the uncertainty of the probability information.
Accordingly, it is a general object of the present invention to provide a novel and useful information input method and apparatus, in which the problems described above are suppressed.
Another and more specific object of the present invention is to provide an information input method and an information input apparatus, which can quickly input information with a high accuracy.
Still another object of the present invention is to provide an information input method for inputting certain information and probability information having uncertainty, comprising displaying a plurality of candidates with respect to the probability information that is input; and selecting and fixing one of the plurality of displayed candidates in response to the certain information that is input. According to the information input method of the present invention, it is possible to quickly input information with a high accuracy.
A further object of the present invention is to provide an information input apparatus comprising a certain information input unit configured to input certain information; a probability information input unit configured to input probability information having uncertainty, and to obtain a plurality of candidates with respect to the probability information; a candidate display unit configured to display the plurality of candidates; and a selecting and fixing unit configured to fix and select one of the plurality of displayed candidates in response to the certain information input by the certain information input unit. According to the information input apparatus of the present invention, it is possible to quickly input information with a high accuracy.
Other objects and further features of the present invention will be apparent from the following detailed description when read in conjunction with the accompanying drawings.
The information input apparatus shown in
The line control unit 11 receives audio signals from telephone sets 19 of users via a public line 18, and sends an audio signal output from a microphone within the input device 15 to the telephone sets 19 via the public line 18. The microphone within the input device 15 picks up the operator's speech. In addition, the line control unit 11 controls the connection and the disconnection of the lines.
The processing unit 12 may be formed by a CPU, MPU or the like. The processing unit 12 executes software programs of various processes stored in the memory device 13, including a speech recognition process. The database 14 includes various databases (DBs) for use by an information input process. The input device 15 includes the microphone, a keyboard, a mouse, and an analog-to-digital converter (ADC) for converting the operator's speech picked up by the microphone into a digital audio signal. The output device 16 includes a display device which functions as a display means, a printer and the like.
A mouse input process (or means) 22 reads input information from the mouse of the input device 15 that is operated by the operator, and supplies the read input information to the screen input process (or means) 24. The screen input process (or means) 24 supplies the input information from the keyboard or mouse to an input content analyzing process (or means) 26, as certain information, in order to reflect the input information to a display on the display device of the output device 16.
A microphone input process (or means) 28 inputs the digital audio signal output from the microphone of the input device 15, which picks up the operator's speech, and supplies the digital audio signal to a speech recognition process (or means) 30. The speech recognition process (or means) 30 uses document structure candidates and candidate words that are registered in advance in a speech recognition candidate database 32 within the database 14, and carries out a speech recognition with respect to the digital audio signal received from the microphone input process (or means) 28. A plurality of candidate words and certainties are obtained as a speech recognition result, and the speech recognition process (or means) 30 supplies the speech recognition result to the input content analyzing process (or means) 26, as probability information. The speech recognition does not recognize the entire document, but carries out a word spot recognition to recognize only candidate words, that are registered in advance, within the document.
The input content analyzing process (or means) 26 notifies the certain information received from the screen input process (or means) 24 to the speech recognition process (or means (30), and of the probability information received from the speech recognition process (or means) 30, groups candidate words having the same contents into a single item. The input content analyzing process (or means) 26 generates a display request for displaying the candidate words in an order of the highest certainty for each item, and also generates a display request for displaying the certain information. The input content analyzing process (or means) 26 supplies the generated display requests to a response control process (or means) 36. The response control process (or means) 36 determines display contents using a response log holding process (or means) 38, and a product information database 40 and a response information database 42 within the database 14, and supplies the determined display contents to an output content generating process (or means) 44.
The output content generating process (or means) 44 generates screen layout data for displaying a screen in accordance with the display contents, and character data of characters, numerals, symbols and the like, and outputs a screen output request to an image output process (or means) 46. The image output process (or means) 46 generates image data of a display screen based on the screen output request. The image data is supplied to the display device of the output device 16 via a display output process (or means) 48, and is displayed on the display device.
Next, a description will be given of a probability information display process that is carried out when the conversation shown in
The input content analyzing process (or means) 26 generates a display request for displaying and determining the probability information received from the speech recognition process (or means) 30, and supplies the display request to the response control process (or means) 36, in a step S13. The response control process (or means) 36 determines the display contents using the response log holding process (or means) 38 within the memory device 13 and the product information database 40 and the response information database 42 within the database 14, and supplies the display contents to the output content generating process (or means) 44, in a step S14.
The output content generating process (or means) 44 generates the screen layout data and the character data according to the display contents, and supplies a screen output request to the screen output process (or means) 46, in a step S15. The screen output process (or means) 46 generates the image data of the display screen based on the screen output request, and displays the image data on the screen of the display device, so as to urge the operator to input the certain information.
As a result, a display shown in
In
The input content analyzing process (or means) 26 generates a display request for displaying the selected candidate words as the certain information in the fixed display regions 50, 51 and 52, and supplies the display request to the response control process (or means) 36, in a step S22. The input content analyzing process (or means) 26 stops the display of the candidate word tables 55, 56 and 57 with respect to the item for which the candidate word is selected.
The response control process (or means) 36 generates the screen layout data and the character data according to the display contents, and outputs a screen output request to the screen output process (or means) 46, so as to displays the image data on the screen of the display device, in a step S23.
In
The input content analyzing process (or means) 26 generates a display request for displaying the selected candidate word, as the certain information, in the fixed display region 50, and supplies the display request to the response control process (or means) 36 and notifies this to the output content generating process (or means) 44, in a step S32. The output content generating process (or means) 44 generates the screen layout data and the character data according to the display contents, and supplies a screen output request to the screen output process (or means) 46. Hence, a display having “lap-top personal computer” input in the fixed display region 50 is displayed on the screen of the display device, as shown in
The input content analyzing process (or means) 26 notifies the certain information to the speech recognition process (or means) 30, in a step S33. The speech recognition process (or means) 30 extracts only the candidate words corresponding to the certain information from the candidate words that are registered in advance in the speech recognition candidate database 32, in a step S34.
Next, when the operator makes an input by speech, the microphone input process (or means) 28 inputs the audio signal of the operator's speech and supplies the audio signal to the speech recognition process (or means) 30, in a step S35. The speech recognition process (or means) 30 carries out the speech recognition with respect to the audio signal using the document structure candidates that are registered in advance in the speech recognition candidate database 32 and the extracted candidate words, in a step S36.
A plurality of candidate words and certainties are obtained as the speech recognition result. The plurality of candidate words and certainties are supplied to the input content analyzing process (or means) 26, as probability information, and displayed on the display device, similarly as in the case of the process shown in
According to the probability information fixing process shown in
In
The input content analyzing process (or means 26 notifies the certain information of the input item instruction to the speech recognition process (or means) 30, in a step S42. The speech recognition process (or means) 30 extracts candidate words corresponding to the certain information of the input item instruction, from the candidate words that are registered in advance in the speech recognition candidate database 32, in a step S43.
Next, when the operator makes an input by speech, the microphone input process (or means) 28 inputs the audio signal of the operator's speech and supplies the audio signal to the speech recognition process (or means) 30, in a step S44. The speech recognition process (or means) 30 carries out the speech recognition with respect to the audio signal using the document structure candidates that are registered in advance in the speech recognition candidate database 32 and the extracted candidate words, in a step S45.
A plurality of candidate words and certainties are obtained as the speech recognition result. The plurality of candidate words and certainties are supplied to the input content analyzing process (or means) 26, as probability information, and displayed on the display device, similarly as in the case of the process shown in
In this case, candidate words of the items, namely, the product category, the model name and the coping content shown in
In
When the operator makes an input operation from the keyboard or mouse to move the cursor to one of the phrases in the conversation examples 62, the keyboard input process (or means) 20 or the mouse input process (or means) 22 reads the cursor position as the input information of the category instruction, and the screen input process (or means) 24 supplies the input information to the input content analyzing process (or means) 26, as certain information, in a step S52.
The input content analyzing process (or means) 26 generates a display request for displaying the certain information of the category instruction in the fixed display region 52, and supplies the display request to the response control process (or means) 36, in a step S53. Thereafter, the display is made on the display device, similarly as in the case of the process shown in
In
The input content analyzing process (or means) 26 notifies the certain information to the speech recognition process (or means) 30, in a step S63. The speech recognition process (or means) 30 extracts only the candidate words corresponding to the certain information of the one-character instruction, from the candidate words that are registered in advance in the speech recognition candidate database 32, in a step S64.
Next, when the operator makes an input by speech, the microphone input process (or means) 26 inputs the audio signal of the operator's speech and supplies the audio signal to the speech recognition process (or means) 30, in a step S65. The speech recognition process (or means) 30 carries out a speech recognition with respect to the audio signal using the document structure candidates that are registered in advance in the speech recognition candidate database 32 and the extracted candidate words, in a step S66.
A plurality of candidate words and certainties are obtained as the speech recognition result. The plurality of candidate words and certainties are supplied to the input content analyzing process (or means) 26, as probability information, and displayed on the display device, similarly as in the case of the process shown in
In this case, the candidate words for “lap-top personal computer” are registered in the candidate word table of the model name in the speech recognition candidate database 32 as shown in
In
When the operator makes an input operation from the keyboard or mouse to move the cursor to one of the categories 67 and 68 in the process flow 66, the keyboard input process (or means) 20 or the mouse input process (or means) 22 reads the cursor position as the input information of the category instruction, and the screen input process (or means) 24 supplies the input information to the input content analyzing process (or means) 26, as the certain information, in a step S72.
The input content analyzing process (or means) 26 generates a display request for displaying the certain information of the category instruction in the fixed display region 52, and supplies the display request to the response control process (or means) 36, in a step S73. Thereafter, the display is made on the display device, similarly as in the case of the process shown in
In
The speech recognition process (or means) 30 carries out a speech recognition with respect to the audio signal using the document structure candidates and the candidate words that are registered in advance in the speech recognition candidate database 32, obtains a plurality of candidate words and certainties as the speech recognition result, and supplies the plurality of candidate words and certainties to the input content analyzing process (or means) 26, as probability information, in a step S82.
The input content analyzing process (or means) 26 generates a display request for displaying and fixing the probability information received from the speech recognition process (or means) 30, and supplies the display request to the response control process (or means) 36, in a step S83.
The response control process (or means) 36 determines the display contents of the candidate word table 55 using the response log holding process (or means) 38 within the memory device 13 and the product information database 40 and the response information database 42 within the database 14, and supplies the display contents to the output content generating process (or means) 44, in a step S84. In this particular case, the response control process (or means) 36 extracts, from the response log holding process (or means) 38, the response log with respect to the candidate word “lap-top personal computer” having the largest certainty with respect to the speech input, obtains the display contents of the candidate word table 57 by rearranging the copying contents depending on the frequency of use of the responses (that is, the coping contents), and supplies the display contents to the output content generating process (or means) 44.
The output content generating process (or means) 44 generates the screen layout data and the character data depending on the display contents, and supplies a screen output request to the screen output process (or means) 46, in a step S85. The screen output process (or means) 46 generates image data of the display screen based on the screen output request, and displays the image data on the screen of the display device, so as to urge the operator to input the certain information.
When the operator makes an input by speaking “lap-top personal computer” and “A120”, for example, the microphone input process (or means) 28 inputs the audio signal of the operator's speech and supplies the audio signal to the speech recognition process (or means) 30, in a step S91.
The speech recognition process (or means) 30 carries out a speech recognition with respect to the audio signal using the document structure candidates and the candidate words that are registered in advance in the speech recognition candidate database 32, obtains a plurality of candidate words and certainties as the speech recognition result, and supplies the plurality of candidate words and certainties to the input content analyzing process (or means) 26, as probability information, in a step S92.
The input content analyzing process (or means) 26 generates a display request for displaying and fixing the probability information received from the speech recognition process (or means) 30, and supplies the display request to the response control process (or means) 36, in a step S93.
The response control process (or means) 36 determines the display contents of the candidate word table 55 using the response log holding process (or means) 38 within the memory device 13 and the product information database 40 and the response information database 42 within the database 14, and supplies the display contents to the output content generating process (or means) 44, in a step S94. In this particular case, the response control process (or means) 36 extracts, from the response log holding process (or means) 38, the response log with respect to the speech inputs “lap-top personal computer” and “A120”, and extracts a simultaneous use probability that indicates a probability of “lap-top personal computer” and “A120” being used simultaneously. The response control process (or means) 36 changes (or modifies) the certainties of the candidate words “lap-top personal computer” and “A120” depending on the simultaneous use probability, obtains the display contents of the candidate word tables 55 and 55, and supplies the display contents to the output content generating process (or means) 44.
The output content generating process (or means) 44 generates the screen layout data and the character data according to the display contents, and supplies a screen output request to the screen output process (or means) 46, in a step S95. The screen output process (or means) 46 generates the image data of the display screen based on the screen output request, and displays the image data on the screen of the display device, so as to urge the operator to input the certain information.
In the embodiment described above, the present invention is applied to speech recognition. However, the probability information may be obtained through processes other than speech recognition, such as image recognition. In this case, the microphone input process (or means) 28 may be changed to an image input process (or means), the speech recognition process (or means) 30 may be changed to an image recognition process (or means), and the speech recognition candidate database 32 may be changed to an image recognition candidate database.
The keyboard input process (or means) 20, the mouse input process (or means) 22 and the screen input process (or means) 24 may form a certain information input process (or means). The microphone input process (or means) 28, the speech recognition process (or means) 30 and the speech recognition candidate database 32 may form a probability information input process (or means). The input content analyzing process (or means) 26 may form a selecting and fixing process (or means). The step S34 may form a first candidate limiting process (or means). The step S41 may form an input item selecting process (or means), and the step S43 may form a second candidate limiting process (or means). In addition, the step S62 may form a partial selecting process (or means), and the step S64 may form a third candidate limiting process (or means).
Further, the present invention is not limited to these embodiments, but various variations and modifications may be made without departing from the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2004-185249 | Jun 2004 | JP | national |