1. Field of the Invention
The present invention relates to a speech recognition system and, more particularly, to a speech recognition system that is not only able to show graphically the speech recording status, the speech processing status, and the complete speech recognition status by waveforms display, but also able to connect each of the waveforms and texture displays with a command menu. Each command menu contains at least a command for users to correct the speech recognition errors or adjust the speech recognition system. The invention is suitable for electronic devices with graphic interface, such as desktop computers, notebook computers, home multimedia-center systems, television sets, DVD machines, audio or video systems, mobile phones, or personal digital assistants.
2. Description of the Prior Art
The development of speech recognition techniques makes it more convenient for users to operate electronic devices. Conventionally, when using any kind of electronic devices, such as desktop computers, notebook computers, home multimedia-center systems, television sets, DVD machines, audio or video systems, mobile phones, personal digital assistants or others, users usually operate these electronic devices by hands. For example, when users utilize computers, they need to input commands by using a keyboard, a mouse, or other accessory controlling devices by their hands. The input procedure may be simplified by using a touch screen. However, it is still not ideal for users to input by using a touch screen because users still have to use their fingers to press on the screen and the display area on the screen is limited. The problems mentioned above may only cause inconvenience to general users but, however, may make it impossible for handicapped users, users with neuromuscular disorders, or blind users to operate these electronic devices. With respect to these problems, speech recognition technology is one of the promising solutions.
In the application of speech recognition technology, users can input their speech sounds into a speech recognition system by using audio input devices like microphones and the input speech sounds can be converted into corresponding words or be further converted into corresponding operation commands according to the speech recognition results.
Users have to input their speech sounds via the audio input devices before the speech sounds start to be recognized by the speech recognition system. There are many factors that influence the final speech recognition results during the recording and speech recognition processes, such as the quality of the audio input devices, the recording environment, the distance between the users and the audio input devices, and so on. Therefore, it is necessary for users to monitor the quality of recording and speech recognition during the recording and speech recognition processes. Prior arts provide different icons or different shape-changes of an icon for representing the recording status or the speech recognition status. However, it still fails to indicate the success and quality of the recording or speech recognition processes.
In addition, prior arts provide functions for adjusting the speech recognition functions according to the speech recognition results, but their functions are not designed for the word units of the speech recognition results, especially not for the words failed to be recognized correctly in the speech recognition, so that their functions are still not precise to improve the performance of the speech recognition system to approach the specific characteristics of users. Thus, their speech recognition systems are difficult to be made more suitable for each user. For example, users may have their own accents. If the feedback control and the adjustment cannot be made directly on words or terms specifically, it will be difficult to make a speech recognition system highly associated with each user and the efficiency of the speech recognition system will be decreased significantly for accented speakers.
In order to solve the problems mentioned above and make speech recognition able to be adopted more widely, inventor had the motive to study and develop the present invention after hard research to provide a speech recognition system that can be used conveniently and can be adjusted via the feedback and the adjustment made by users according to specific word units in the speech recognition results to make the speech recognition system more suitable for each user.
An object of the present invention is to provide a speech recognition system with waveforms display for representing a recording status, a speech processing status, or a complete speech recognition status, by which users can monitor the quality of speech recording, the speed of speech processing, and the confidence levels of the speech recognition results.
Another object of the present invention is to provide a speech recognition system with a correction and adjustment scheme by which users can correct the speech recognition errors or adjust the speech recognition system.
In order to achieve the above objects, the present invention provides a speech recognition system comprising at least a speech recognition engine and a display device that has a signal status interface and a textual interface. The signal status interface is used for showing a recording status, an ongoing speech processing status, or a complete speech recognition status thereon by waveforms display, wherein waveforms are used for representing speech signals from speakers at the same time. The textual interface is used for showing the speech recognition result that includes at least a word unit thereon. Besides, each word unit of the speech recognition results corresponds to a waveform unit in the signal status interface. More importantly, each word unit and each waveform unit are connected with a command menu, respectively, which includes at least a command for users to correct the speech recognition errors or to adjust the speech recognition system.
Preferably, the waveforms are presented in different colors for representing the recording status, the ongoing speech processing status, and the complete speech recognition status respectively.
After the speech signals are completely recognized, the word units of each speech recognition result are presented on the textual interface. Preferably, each waveform unit shown on the signal status interface and each word unit shown on the textual interface are aligned with each other and both are presented in the same color which indicates the recognition confidence level of the word unit. There are three categories of recognition confidence levels, including good quality, mediocre quality, and bad quality in which condition speech recognition results should be noticed and probably be corrected. These categories of quality are presented in different colors.
The textual interface is connected with a command menu that includes at least a command for users to correct the recognition errors or adjust the speech recognition system.
The waveform unit is connected with a command menu that includes at least a command for users to listen to the recoded speech sound, to re-record the sound, to correct the recognition errors, or to adjust the speech recognition system.
The following detailed description, given by way of examples and not intended to limit the invention solely to the embodiments described herein, will be understood best in conjunction with the accompanying drawings.
As mentioned above, the signal status interface 30 according to the present invention is used for showing the recording status and the speech recognition status by using a waveform 32. The speech recognition status includes an ongoing speech processing status and a complete speech recognition status. In addition, the recording status, the ongoing speech processing status, and the complete speech recognition status are presented in different colors to visually represent the progress of speech recognition processes. When the speech sound is input by a speaker, the waveform of the current input speech is displayed in the color of the recording status. After the speech recognition process is started, part of the waveform is gradually replaced by the color of the ongoing speech processing status. When the whole recorded speech sound is processed completely, its waveform is drawn in colors of the complete speech recognition status: some word units are drawn in the color of high confident recognition quality, some in the color of mediocre quality, and some in the color of bad quality. Accordingly, users can know visually more information of the system, including what the current status is, the quality of speech recognition, and the processing speed.
When the input speech signals are still being recognized, the recognized and un-recognized parts of a waveform 32 shown on the signal status interface 30 are presented in different colors. As shown in
When the speech signals input by users are completely recognized, the best candidate words 420 of the speech recognition result 42 corresponding to the speech signals are shown on the textual interface 40. As shown in
When the speech recognition system is involved in speech understanding applications, the words of the speech understanding result are also shown on the textual interface 40, while the way of waveforms display is unchanged on the signal status interface 30. Besides, the words of the speech recognition result can also be shown on the textual interface 40 directly, or be hidden by default but be shown thereon only after users select the choice of presenting the result.
Also referring to
Moreover, each waveform unit is connected with a command menu that has at least a command for users to check the input speech sounds, re-record the speech sounds, correct the speech recognition errors, or adjust the speech recognition system. As shown in
For example, if a user finds that the waveform 32 is in peculiar shape, the user can select the command “Play” 52 to listen to the recorded speech signals and judge whether there is any noise interference in the recording process. Or, users can find out the reasons why the words in the speech recognition results are incorrect, such as a pronunciation problem. If the recorded speech sounds are clean but with pronunciation deviations from general cases, the user can select “Record” to re-record the sound or select “Train” to adjust the system to improve the accuracy of the specific word for the users. Before the speech recognition system is able to correctly recognize the input speech sounds of the users, sometimes they also can select the command “Writing” or “Keyboard” to switch a speech input mode into a handwriting mode or a keyboard input mode to correct the errors and complete the input task.
Each word 420 in the speech recognition result is connected with a command menu that has at least a command for users to correct speech recognition errors or to adjust the speech recognition system. As shown in
Furthermore, the reason why the words of the speech recognition results are incorrect may be due to the pronunciation deviations of the user. As shown in
Thereby, the present invention has the following advantages:
Accordingly, as disclosed in the above description and attached drawings, the present invention can provide users with a speech recognition system that can be easily monitored whether the recording proceeds successfully, the quality of input speech signals, the speech processing status, and the confidence levels of speech recognition results. Users can also conveniently correct the recognition errors and adjust the speech recognition system to improve its accuracy. The invention is novel and can be put into industrial use.
It should be understood that different modifications and variations could be made from the disclosures of the present invention by the people familiar in the art, which should be deemed without departing the spirit of the present invention.