This application claims the priority benefit of China application serial no. 201810275904.3, filed on Mar. 30, 2018. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The present disclosure relates to a voice application system and a method thereof.
Currently, when using devices such as a computer and a mobile phone, communication with the device is carried out via an input interface such as a mouse, a keyboard, a touch or a gesture, and the input mode is a fixed mode which cannot be flexibly defined by user. In addition, those input methods require use of body (e.g., hands or feet). For disabled users, e.g., those who have difficulty in using their bodies (e.g., hands or feet) for making input, those input methods are not applicable. Therefore, the input mode with use of natural language such as face recognition, fingerprint recognition, voice and so on is needed to carry out communication with the device and make input.
The disclosure provides a voice application system and a method thereof, which allow user to define his/her own voice to correspond to different applications with high flexibility.
The disclosure provides a voice application system. The system includes an input device, a database and processor. The processor is electrically connected to the input device and the database. The processor executes a voice program. The input device receives a first voice signal. The voice program analyzes the first voice signal to obtain a first voice feature corresponding to the first voice signal. The voice program stores a corresponding relationship of the first voice feature and a first function selected by the user into the database, and the voice program performs voice recognition operation according to the corresponding relationship in the database.
According to an embodiment of the disclosure, before the voice program analyses the first voice signal to obtain the first voice feature corresponding to the first voice signal, the voice program performs pre-processing operation to the first voice signal.
According to the embodiment of the disclosure, the system further includes an output apparatus. After the voice program analyses the first voice signal to obtain the first voice feature corresponding to the first voice signal, the output apparatus outputs a first recognition result corresponding to the first voice feature. When the input device receives first confirmation information representing that the first recognition result is identical to the first voice signal, the input device receives first selection information used for selecting the first function. The voice program performs an operation of storing the corresponding relationship of the first voice feature and the first function selected by the user into the database according to the first selection information.
According to an embodiment of the invention, when the voice program performs voice recognition operation according to the corresponding relationship in the database, the input device receives a second voice signal. The voice program analyses the second voice signal to obtain a second voice feature corresponding to the second voice signal. The voice program determines whether the second voice feature is consistent with the first voice feature in the database. When the voice program determines that the second voice feature is consistent with the first voice feature in the database, the output apparatus outputs prompt information to inquire the user whether the first function is to be performed. When the input device receives second confirmation information used for performing the first function according to the prompt information, the voice program performs the first function.
According to an embodiment of the disclosure, the system further includes an output apparatus. The input device receives a third voice signal used for instructing to close the voice program. The voice program analyses the third voice signal to obtain a third voice feature corresponding to the third voice signal. The output apparatus outputs a third recognition result corresponding to the third voice feature. When the input device receives third confirmation information representing that the third recognition result is identical to the third voice signal, the input device receives second selection information used for closing the voice program, and the voice program closes the voice program according to the second selection information.
According to an embodiment of the disclosure, the system further includes an output apparatus. The input device receives a fourth voice signal. The voice program analyses the fourth voice signal to obtain a fourth voice feature corresponding to the fourth voice signal. The output apparatus outputs a fourth recognition result corresponding to the fourth voice feature. When the input device receives fourth confirmation information representing that fourth recognition result is identical to the fourth voice signal, the input device receives third selection information used for deleting the corresponding relationship of the first voice feature and the first function. The voice program deletes the corresponding relationship of the first voice feature and the first function in the database according to the third selection information.
The disclosure provides a voice application method. The method includes the following steps: executing a voice program; receiving a first voice signal; analyzing the first voice signal through the voice program to obtain a first voice feature corresponding to the first voice signal; storing a corresponding relationship of the first voice feature and a first function selected by the user into the database through the voice program; and performing voice recognition operation according to the corresponding relationship in the database through the voice program.
According to an embodiment of the disclosure, before the step of analyzing the first voice signal through the voice program to obtain the first voice feature corresponding to the first voice signal, the method further includes performing a pre-processing operation to the first voice signal through the voice program.
According to an embodiment of the disclosure, after the voice program analyses the first voice signal to obtain the first voice feature corresponding to the first voice signal, the method further includes the following steps: outputting a first recognition result corresponding to the first voice feature; and when first confirmation information representing that the first recognition result is identical to the first voice signal is received, receiving a first selection information used for selecting the first function, storing a corresponding relationship of the first voice feature and the first function selected by the user into the database according to the first selection information through the voice program.
According to an embodiment of the disclosure, the step of performing the voice recognition operation according to the corresponding relationship in the database through the voice program includes the following steps: receiving a second voice signal; analyzing the second voice signal through the voice program to obtain a second voice feature corresponding to the second voice signal; determining whether the second voice feature is consistent with the first voice feature in the database through the voice program; when the voice program determines that the second voice feature is consistent with the first voice feature in the database, outputting prompt information to inquire the user whether the first function is to be performed; and when second confirmation information used for performing the first function is received according to the prompt information, performing the first function through the voice program.
According to an embodiment of the disclosure, the method further includes the following steps: receiving a third voice signal for instructing to close the voice program; analyzing the third voice signal through the voice program to obtain a third voice feature corresponding to the third voice signal; outputting a third recognition result corresponding to the third voice feature; and when third confirmation information representing that the third recognition result is identical to the third voice signal is received, receiving second selection information used for closing the voice program, closing the voice program according to the second selection information through the voice program.
According to an embodiment of the disclosure, the method further includes the following steps: receiving a fourth voice signal; analyzing the fourth voice signal through the voice program to obtain a fourth voice feature corresponding to the fourth voice signal; outputting a fourth recognition result corresponding to the fourth voice feature; when fourth confirmation information representing that the fourth recognition result is identical to the fourth voice signal is received, receiving third selection information used for deleting the corresponding relationship of the first voice feature and the first function, deleting the corresponding relationship of the first voice feature and the first function in the database according to the third selection information through the voice program.
Based on the above, the disclosure provides a voice application system and a method thereof, which allow the user to define his/her voice to correspond to different applications with high flexibility. The method of using voice input to define application includes the following four parts: adding, using, closing or deleting user-defined voice. The process flow of the four parts are clearly defined. For those who have difficulty in using conventional input methods such as keyboard, mouse or touch, the disclosure provides a better method for canying out communication with device.
Referring to
The processor 10 may be a central processing unit (CPU) or a programmable general purpose or special purpose microprocessor, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC) or other similar element or a combination of the above.
The input device 12 may be a microphone, a keyboard, a mouse or a touch screen or other element capable of receiving user's input or a combination of the above.
The output apparatus 14 may be a screen, a speaker or other element capable of outputting information to the user or a combination of the above.
The database 16 may be a fixed or a movable random access memory (RAM) of any forms, a read-only memory (ROM), a flash memory or a similar element or a combination of the above.
In the embodiment, a plurality of program segments are stored in the database 16 of the voice application system 1000. After being installed, the program segments are executed by the processor 10. For example, the database 16 includes a plurality of modules, the modules are used to respectively perform various operations applied to the voice application system 1000, wherein each of the modules consists of one or more program segments, which should not be construed as a limitation to the disclosure. Each of the operations of the voice application system 1000 may be realized in the form of other hardware.
Referring to
Thereafter, in step S211, the user may confirm whether the first recognition result output by the output apparatus 14 is identical to the first voice signal (i.e., user's sound). If not, the step S203 is resumed and performed. If yes, in step S213, the input device 12 may receive first confirmation information input by the user for representing that the first recognition result is identical to the first voice signal, and the user uses the input device 14 for making input such that the input device 14 receives first selection information for selecting the first function. Here, the first function is assumed as a function of “activating camera”. Thereafter, in step S215, the voice program may store a corresponding relationship of the first voice feature (e.g., voice feature of “activating camera”) and the first function (e.g., function of “activating camera”) selected by the user into the database 16 according to first selection information input by the user.
Thereafter, the voice program can perform voice recognition operation according to the corresponding relationship of the voice feature and the function selected by the user in the database 16.
Referring to
When the voice program determines that the second voice feature is not consistent with the voice feature stored in the database, the step S303 may be resumed and performed. When the voice program determines that the second voice feature is consistent with the voice feature (e.g., first voice feature) stored in the database, in step S311, the output apparatus 14 outputs prompt information to inquire the user whether the first function (e.g., function of “activating camera”) corresponding to the first voice feature is to be performed. When the input device 12 receives second confirmation information used for performing the first function, the voice program may perform the first function in step S313.
Additionally, the user may further use voice recognition to close the activated voice program.
Referring to
Thereafter, in step S407, the user may confirm whether the third recognition result output by the output apparatus 14 is identical to the third voice signal (i.e., user's sound). If not, in step S408, the voice recognition operation shown in
Moreover, the user may further use voice recognition to delete the corresponding relationship of the voice feature stored and the function selected by the user in the database 16.
Referring to
Thereafter, in step S511, the user can confirm whether the fourth recognition result output by the output apparatus 14 is identical to the fourth voice signal (i.e., user's sound). If not, the step S503 may be resumed and performed. If yes, in step S513, the input device 12 may receive fourth confirmation information input by the user for representing that the fourth recognition result is identical to the fourth voice signal. Thereafter, in step S515, the user may confirm whether to delete the corresponding relationship of the first voice feature and the first function in the database 16. If not, the process flow shown in
Referring to
In summary, the disclosure provides a voice application system and a method thereof, which allow the user to define his/her voice to correspond to different applications with high flexibility. The method of using voice input to define application includes the following four parts: adding, using, closing or deleting user-defined voice. The process flow of the four parts are clearly defined. For those who have difficulty in using conventional input methods such as keyboard, mouse or touch, the disclosure provides a better method for carrying out communication with device.
Number | Date | Country | Kind |
---|---|---|---|
201810275904.3 | Mar 2018 | CN | national |