1. Field of the Invention
The present invention provides an indicating method for speech recognition system, more particularly, an indicating method that allows users to get an immediate understanding of input status and adjust the volume to fulfill voice command operations virtually with guidance of acoustic and graphical interfaces together with recording waveforms, thus enhancing speech recognition rate and avoiding abnormal or poor sound acquisition.
2. Description of the Prior Art
Currently in the IT age with Internet beyond boundaries, multimedia audio and video (AV) signals can be transmitted and downloaded in network packets for the purpose of digital AV signal transmissions. These AV signals can be downloaded from legitimate websites and stored in multimedia storage/playing devices including portable disc players, MP3 (or MP4, MP5) players, iPod players, PCs or notebook PCs, and transmitted and played by connecting with sound amplifying devices, such as microphones, loudspeakers, sound boxes or earphones, etc.
However, it is necessary for users to press and touch buttons, knobs or other human-machine interfaces (HMI) of various kinds on the surface of common multimedia storage/playing devices with their fingers before these devices can perform playing and selecting songs or switchover to other items. Only in this way can users conduct more switchovers or selection of play patterns. Undoubtedly, this playing requirement will add to inconvenience and difficulty in use of these devices. Besides, as the multimedia storage/playing devices are designed to occupy less space in compliance with miniaturization requirements, which requires the size of buttons and HMI, etc. to be reduced, as a result, users are liable to undesired touching and make mistakes in entering or selection when they press or touch the buttons of these devices, thus having impact on operating convenience and accuracy.
In addition, to eliminate the disadvantages mentioned above, the persons from the industry concerned invent this speech recognition device, and connect it with multimedia storage/playing devices for storing voice files inside. This enables the device to recognize and analyze the speech signals (microphone sound) inputted by external users by using the recognition module built in the speech recognition device, and then start the multimedia storage/playing device to play the voice files. At the same time, the speech recognition device can achieve the functions of control and operation of selecting, adjusting and switching the contents to be played based on externally inputted speech signals. However, speech signals cannot be virtually entered often due to abnormality of users' microphones (failure, damage or unsuccessful connection, and the volume is set to be too high or low, etc) and their improper use (receiving sounds in a place too faraway or close to microphones) when the speech recognition device identifies and analyzes the externally inputted speech signals. Or, low recognition rate of speech signals or distortions occur as a result of poor sound acquisition due to effect of noisy environment to different degrees, and the problems remain unsolved. This will not only make it rather inconvenient and inefficient to use this device, but also have impact on willingness of users to use it, or bring discomfort to them. Imperceptibly, these things will lead to economic losses which may be too heavy to be estimated, and do not accord with the considerations in economic benefits.
Thus, what the firms involved in this industry need urgently to research and improve is how to solve the problems of reduction in overall added values and increase in costs, which are related to inconvenient use, operating complexity and difficulty resulting from low recognition rate of speech signals or distortions due to abnormality of microphones and poor sound acquisition as users enter speech signals into the speech recognition device for control and operation of selection, adjustment and switchover of the contents to be played by the multimedia storage/playing device.
In view of the aforesaid deficiencies and disadvantages, the inventor, after collecting relevant materials, inviting assessments and reviews from various parties, relying on his own experience of many years in this industry and through continuous trials and corrections, has finally invented the method for speech recognition system.
The primary objective of the present invention is to fulfill the function that enables users to enter voice commands into a voice input unit and convert the commands into speech signals, which are acquired and stored by a recording unit, then converted by a microprocessor into a volume indicating oscillogram, and finally displayed by a display module. At the same time, compliance with speech recognition conditions will be decided in that process. Thus, it can make the device rely on an indicating module to mark diagrams, letters or colors, or indicate speeches according to volume indicating oscillogram, followed by playing over a sound amplifying unit, so that users can understand the voice input status and adjust the volume to fulfill voice command operations virtually through speech indication, explanations in graphs or letters and other interactive guidance, together with audio indication oscillogram, and at the same time, avoid such problems and deficiencies as low speech recognition rate or distortions resulting from dysfunction of microphones and poor sound acquisition. In this way, this device can be used simply, easily and quickly, thus improving its functions and effect in overall use.
To achieve the objectives and functions stated above as well as the technology and framework adopted in the present invention, an example of the preferred embodiment of the present invention is given to describe its features and functions in detail with reference to the accompanying drawings for the purpose of full understanding.
Refer to
The multimedia electronic product 1 may be an iPod player (digital multimedia player), MP3 player, PC, notebook PC or other electronic product with the multimedia storage/playing function, and is equipped with a storage module 11 for storing audio or video signals inside. Besides, the multimedia electronic product 1 has a transmission interface 12 and an HMI 13 that can execute embedded programs and edit and store signals.
Inside the speech recognition device 2, there is a microprocessor 21 that can perform editing of internal programs and system units of various kinds or communication and processing of input signals. The microprocessor 21 is connected with a connecting interface 22 and a plug interface 23, both of which can be linked with the transmission interface 12 of the multimedia electronic product 1, and the plug interface 23 is further linked with an external voice input unit 3 (e.g. microphone or ear microphone). A recording unit 24 can acquire and store the speech signals from the voice input unit 3, while an indicating module 25 can read the speech signals stored in the recording unit 24 for volume indication and is connected with a sound amplifying unit 26 for outward sound amplification (for example, loudspeaker, sound box or earphone); and a recognition module 27 can read the speech signals stored in the recording unit 24 for the purpose of recognition and analysis. In addition, the microprocessor 21 is connected with a display module 28 that can display the volume indications reflected in the indicating module 25 (such as LCD or panel).
For fabrication of the present invention, the storage module 11 in the multimedia electronic product 1 will be used to store and record multiple speech signals (e.g. songs, music or recordings) in advance and linked with the connecting interface 22 of the speech recognition device 2 via the transmission interface 12. The multimedia electronic product 1 is started by volume indication and speech signals that have been recognized through the connecting interface 22 and transmission interface 12. That is to say, the speech recognition device 2 depends on the recording unit 23 to acquire and store the speech signals (users' voices) inputted from the external voice input unit 3, then uses the microprocessor 21 to convert the speech signals into a volume indication oscillograph, and finally achieve displays by using the display module 28. At the same time, the microprocessor 21 will decide if the speech signals satisfy the speech recognition condition, read the speech signals stored in the recording unit 24 by using the indicating module 25, and achieve volume indication through the sound amplifying unit 26 and display module 28. Or, if the recognition module 27 is used to read the speech signals stored in the recording unit 24 for speech recognition and analysis, the microprocessor 21 will read the speech signals stored in advance in the storage module 11 of the multimedia electronic product 1, perform selection, switch or editing of the speech signals, and play the speech signals externally through the sound amplifying unit 26.
In addition, for voice input, indication and recognition in the present invention, the operation steps include:
Additionally, the indicating module 25 is used for volume indication of the speech signals inputted via the voice input unit 3 of the present invention, wherein the operation steps comprise:
As shown clearly in the above-mentioned steps, the speech recognition device 2 of the present invention is connected through the plug interface 23 to the voice input unit 3 (microphone or ear microphone), and when users' voices are inputted as speech signals of voice control through the voice input unit 3, these signals can be acquired and stored by the recording unit 24, converted by the microprocessor 21 into a volume indication oscillograph, and then displayed by the display module 28. At the same time, the microprocessor 21 will decide if these signals satisfy the speech recognition conditions? (For example, the environment at time of voice input and voice input status, etc) if so, the recognition module 27 will be used to read the speech signals stored in the recording unit 24 for the purpose of speech recognition and analysis (as shown in
Continue to refer to
In addition, the recognition module 27 included in the present invention will produce a constructive concept script after analyzing the speech signals inputted, and compare it with other constructive concept scripts in the storage module 11 of the multimedia electronic product 1. The steps of operation include:
The multimedia electronic product 1 as stated above can store and record multiple speech signals into the storage module 11 inside in advance through the transmission interface 12, and conduct editing or classification of these speech signals by operating internal programs and systems through the HMI 13 (songs can be classified according to title, singer, volume and Chinese, Taiwanese or Foreign language, etc.). After the user's voices are inputted as speech signals containing selective items (selection of songs, recordings, name of singer, song title, name of volume and switching of songs, etc) through the voice input unit 3 (microphone or ear microphone) and stored via the recording unit 24, these signals will be recognized and analyzed by using the recognition module 27 to search and find the items that satisfy related conditions, and then the sound amplifying unit 26 will be started to play these signals. Or, the microprocessor 21 is used to perform switching and selection of songs, volume adjusting or other selections, etc, thus quickly implementing voice command operations of the speech signals stored in the storage module 11 of the multimedia electronic product 1. In such circumstances, it's not necessary for users to press or touch buttons with their fingers to carry out more switches and selections, thus avoiding undesired touching or choices to be made and improving the convenience and accuracy in operation. Besides, the transmission interface 12 of the multimedia electronic product 1 and connecting interface 22 as well as the plug interface 23 of the speech recognition device 2 may be USB (Universal Serial Bus), SATA (Serial Advanced Technology Attachment) or eSATA (Serial Advanced Technology Attachment) interfaces used to transmit speech signals. It is stated that all steps and methods that can achieve the effects as indicated above should be included in the patent claims of the present invention, and that all other equivalent changes and modifications made without departing from the spirit of the art disclosed in the present invention should be included in the appended claims of the present invention.
To sum up all above descriptions, the indicating method for speech recognition system disclosed in the present invention, when applied, can really achieve its functions and utility. Therefore, the present invention is really an excellent invention with practical applicability, and can satisfy conditions for patentability of a utility model. While the application of patent is filed pursuant to applicable laws, your early approval of the present invention will be highly appreciated so as to guarantee benefits and rights of the inventor who has worked hard at this invention. For any question, please do not hesitate to inform the inventor by mail, and the inventor will try his best to cooperate with you.