The present invention relates to a speech input apparatus and method, in particular, to a speech input apparatus and method for speech template selection.
With a speedy improvement of the speech recognition technology, the speech recognition system has been broadly applied to the fields of household appliances, communication, multi-media, and information products. However, one of the issues which inventors often encounter while developing the speech recognition system is that users always do not know what to say to the microphone, in particular when those products of the speech recognition system with a high degree of freedom for speech input, the users are rather at sea. The consequence is that the users cannot experience the benefit the speech input brings.
There are three different schemes for speech input adopted in the apparatus equipped with speech recognition, which are commonly categorized as follows:
1. Input with a single speech template: in this case, the input speech is constrained by a single template according to the limitation of the apparatus, which sometimes makes it insufficient for precisely expressing a target object.
2. Input with diverse speech templates: in this case, users have to read the instructions for understanding the applicable templates for the apparatus. Once the users forget the applicable templates, they must review the manual to remind themselves. Besides, if a nature language is adopted to be an input style, the accuracy of speech recognition would be decreased because of the complexity of natural languages, even though the users can leave it behind the constraint of templates, it will make the speech system decrease its accuracy of speech recognition because of complexity of natural languages.
3. Provision of dialogue or some dialogue-like mechanisms: in this case the users are guided by the instructions via the system interface. There is an interaction established between the system and the users so as to precede the whole speech input procedures step by step. However, such procedures are always time consuming and make the users feel tedious, especially when errors frequently occur during operation, the users may lose their patient.
It is apparent that there are inevitable drawbacks existing in the mentioned schemes, which make the users can not experience the advantages brought by those humanly interfaces when operating the apparatus with speech recognition. Contrarily the user would rather uses an input apparatus with a keyboard than use the voice-commanded apparatus. In consequence, the voice-commanded apparatus comes to a ceiling during the process of popularization.
To overcome the mentioned drawbacks of the prior art, a novel method and apparatus of speech template selection for the speech recognition is provided.
According to the first aspect of the present invention, a speech input apparatus having a speech input from a user is provided. The speech input apparatus includes a speech template unit providing and switching a plurality of speech templates, an I/O interface communicating the users for the selection of a desired speech template, a speech recognition unit recognizing the speech to provide a result, a database unit storing content database, and a search unit searching the database unit for specific data in response to the result.
Preferably, the I/O interface is a monitor.
Preferably, the I/O interface is a loudspeaker.
Preferably, the I/O interface contains browsing buttons.
Preferably, the speech recognition unit further includes an input device inputting the speech, an extracting device extracting feature coefficients from the speech, a set of constraint models each of which includes a lexicon model and a language model for providing a first recognition reference, an acoustic model providing a second recognition reference, and a speech recognition engine recognizing the speech according to the feature coefficients, the first recognition reference and the second recognition reference.
Preferably, when a specific speech template is selected by the user, the corresponding lexicon model and language model in response to the specific speech template are activated by the template unit for the speech recognition engine.
According to a second aspect of the present invention, a speech input method is provided. The method includes steps of (a) providing a plurality of speech templates, (b) switching the plurality of speech templates, (c) selecting one of the plurality of speech templates as a selected speech template, (d) activating the lexicon model and language model corresponding to the selected speech template, (e) inputting speech, (f) recognizing the speech according to the constraint model as well as the acoustic model, and generating a result, (g) providing the result to a search unit, and (h) searching for a specific data in a database unit in response to the result.
Preferably, the step (f) includes steps of (f1) extracting feature coefficients from the speech, and (f2) recognizing the speech according to the feature coefficients, the constraint model, and the acoustic model.
Preferably, the step (f1) includes steps of (f11) pre-processing the speech, and (f12) extracting feature coefficients from the speech.
Preferably, the speech consists of signals and the step (f11) further including steps of amplifying, normalizing, pre-emphasizing, and Hamming-Window filtering to the speech.
Preferably, the step (f12) further includes steps of performing a Fast Fourier Transform to the speech and calculating the Mel-Frequency Cepstrum Coefficients for the speech.
According to a third aspect of the present invention, a method for dynamically updating the lexicon model and language model for a speech input apparatus is provided. The speech input apparatus includes a database unit and a constraint-generation unit. The provided method can be applied when the content in database unit is changed: (a) related information in database unit is loaded into the constraint-generation unit, (b) the constraint-generation unit converts the information into the necessary lexicon model and language model for speech recognition, (c) the constraint-generation unit also refreshes indices to the content in the database unit, and (d) the generated lexicon model and language model are stored in the constraint unit.
The foregoing and other features and advantages of the present invention will be more clearly understood through the following descriptions with reference to the drawings:
The present invention will now be described more specifically with reference to the following embodiments. It is to be noted that the following descriptions of preferred embodiments of this invention are presented herein for the aspect of illustration and description only; it is not intended to be exhaustive or to be limited to the precise form disclosed.
Please refer to
In the aspect of application, the I/O interface 102 includes loudspeaker, display, and browsing buttons preferably. The speech recognition unit 103 further includes an input device 1031, an extracting device 1032, a constraint-model unit 1033 that contains a lexicon model and a language model for each speech template, an acoustic model 1034, and a speech recognition engine 1035. The speech is input via the input device 1031, and the feature coefficients thereof are extracted by the extracting device 1032 therefrom the speech. Then the input speech is recognized by the speech recognition engine 1035. In this case, the recognition is performed according to the extracted feature coefficients, the activated lexicon model and language model in 1033, and the acoustic model 1034, so that a recognition result is produced correspondingly and passed to the search unit 105. Once the user selects a specific template, the corresponding lexicon model and language model will be activated by the speech template unit 101 for the recognition performed by the speech recognition engine 1035.
Please refer to
Please refer to
Please refer to
Preferably, for the aspect of application, the updating command can be added to the selection menu of the speech input apparatus, so that the users can select it therefrom, and the constraint-generation unit is activated accordingly. The above procedures are performed via the constraint-generation unit so as to update the targets. Besides, such procedures can also achieved on PC end rather than on the speech input apparatus itself.
Based on the above, the present invention provides a novel speech apparatus and method. Through the speech input apparatus, the users do not have to keep in mind the input speech templates and the drawback that users do not know what to say to the microphone is overcame. Furthermore, with the cooperation of the voice-commanded device, the users can greatly experience the benefits providing by the speech input apparatus without keeping in mind the commands and speech templates. Besides, the speech input apparatus and method of the present invention have an efficiently increased accuracy and success for the speech recognition because the recognition scope is limited by the selected speech template. Hence, the present invention not only bears a novelty and a progressive nature, but also bears a utility.
While the invention has been described in terms of what are presently considered to be the most practical and preferred embodiments, it is to be understood that the invention need not to be limited to the disclosed embodiment. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims that are to be accorded with the broadest interpretation, so as to encompass all such modifications and similar structures. According, the invention is not limited by the disclosure, but instead its scope is to be determined entirely by reference to the following claims.
Number | Date | Country | Kind |
---|---|---|---|
93141877 | Dec 2004 | TW | national |