This application claims the priority benefit of PCT/CN2013/077225 filed on Jun. 14, 2013 and Chinese Application No. 201210385273.3 filed on Oct. 12, 2012. The contents of these applications are hereby incorporated by reference in their entirety.
The present document relates to the field of intelligent voice, and more particularly, to a self-adaptive intelligent voice device and method.
With the development of mobile communication technology and mobile phone manufacturing technology, more and more consumers favor smart phones due to their high performances, a variety of supported services and declining costs.
With the improvement of the smart phone hardware performance and the powerful operating system functions, more and more intelligent applications, including intelligent voice service, can be realized. Compared with the traditional manual human-machine interaction, more and more users prefer intelligent voice due to its more humane and convenient interaction mode, and a series of intelligent voice applications such as siri have been emerged on smart phone platforms such as apple and android.
The existing intelligent voice applications mainly consist of three function modules: a voice recognizing module, a recognition result processing module and a voice broadcasting module.
Wherein the voice recognizing module is used for extracting parameters characterizing a human voice, converting lexical content in the human voice to a machine language file such as a binary code file according to the voice parameters, and sending the machine language file to the recognition result processing module; wherein the parameters characterizing the human voice mainly comprise a formant (frequency, bandwidth, amplitude) and a pitch frequency.
The recognition result processing module is used for performing an appropriate operation based on the machine language file, and sending the operation result to the voice broadcasting module; if the lexical content represented by the received machine language file is “Where am I”, the recognition result processing module acquires the user's current location from a positioning module, and sends the location information to the voice broadcasting module;
In the related art, the broadcast voice parameters are selected and determined by the user according to options provided to users, or fixed in the voice broadcasting module before delivering from the factory. For the former, due to the user differences, different users may need to reset values of the voice broadcast voice parameters during the use, giving the users a complex and cumbersome nature of use; for the latter, because all the users use the same voice to broadcast, it results in a single and boring user experience.
The purpose of an embodiment of the present document is to provide a self-adaptive intelligent voice device and method to solve the technical problem about how to make a broadcast voice more closely matching a user's voice.
To solve the abovementioned technical problem, the embodiment of the present document provides the following technical solution:
Alternatively, the broadcast voice parameter generating module is further configured to: acquire the voice parameters from the voice recognizing module after receiving a specific trigger signal or when the device is powered on.
Alternatively, the default policy comprises a corresponding relationship between the voice parameters and the broadcast voice parameters.
Alternatively, the broadcast voice parameter generating module is configured to generate broadcast voice parameters in accordance with the voice parameters and the default policy in the following manner:
A self-adaptive intelligent voice method, wherein the method comprises:
Alternatively, the default policy comprises a corresponding relationship between the voice parameters and the broadcast voice parameters.
Alternatively, the step of generating the broadcast voice parameters according to the voice parameters as well as the default policy comprises:
The abovementioned technical solution establishes a relationship between the broadcast voice parameters and the voice parameters input by a user through default policy, thus avoiding the inadequacy resulted from that the broadcast voice parameters use fixed data and do not consider the user's voice characteristics; in addition, the operation of the abovementioned technical solution generating the broadcast voice parameters does not require human intervention, thus providing users with a convenient use.
To make objectives, technical solutions and advantages of the present document more apparent, hereinafter in conjunction with the accompanying drawings, the embodiments of the present document will be described in detail. It should be noted that, in the case of no conflict, embodiments and features in the embodiment of the present application may be combined randomly with each other.
The broadcast voice parameter generating module 104 is configured to: acquire extracted voice parameters from the voice recognizing module 101, and generate broadcast voice parameters according to the extracted voice parameters and a default policy, and input the broadcast voice parameters to the voice broadcasting module 103.
The default policy provide a corresponding relationship between input parameters and output parameters, wherein the input parameters are the extracted voice parameters, and the output parameters are the broadcast voice parameters; the corresponding relationship may be a simple value corresponding relationship or a complex arithmetic operation.
After acquiring the values of the extracted voice parameters, the broadcast voice parameter generating module 104 determines values of the broadcast voice parameters corresponding to the values of the extracted voice parameters through the default policy, and further acquires the broadcast voice parameters.
The default policy can be as follows: when the input extracted voice parameters characterize a male voice, the output broadcast voice parameters characterize a female voice;
The broadcast voice parameter generating module 104 may acquire the extracted voice parameters from the voice recognizing module 101 after receiving a specific trigger signal (e.g., receiving an enabling self-adaptive intelligent voice instruction signal from a user) or when the device is powered on.
By setting the broadcast voice parameter generating module 104 in the intelligent voice device, the abovementioned embodiment makes the voice parameters used when broadcasting the voice take the voice parameters input by the user into account, thus achieving the effects of self-adaptively changing the broadcast voice in accordance with the difference of the user's voice characteristics, and compared with the existing technologies, it not only reduces the complexity of frequently setting the voice broadcast by different users, but also improves the flexibility and vitality of voice broadcast, thereby greatly improving the comfort of the human-machine interaction experience.
In S201, it is to extract voice parameters from a voice through the voice recognition.
In S202, it is to generate broadcast voice parameters according to the extracted voice parameters and a default policy.
In this step, the broadcast voice parameters may be generated according to the extracted voice parameters and the default policy after receiving a specific trigger signal (e.g., receiving an enabling self-adaptive intelligent voice instruction signal from a user) or when powered on.
The default policy comprises a corresponding relationship between the voice parameters and the broadcast voice parameters: wherein the input parameters are the extracted voice parameters, and the output parameters are the broadcast voice parameters; the corresponding relationship may be a simple value corresponding relationship or a complex arithmetic operation.
After acquiring values of the extracted voice parameters, values of the broadcast voice parameters corresponding to the values of the voice parameters are determined through the default policy, and further the broadcast voice parameters are acquired.
The default policy can be as follows: when the input extracted voice parameters characterize a male voice, the output broadcast voice parameters characterize a female voice;
In S203, it is to use the broadcast voice parameter to generate a broadcast voice.
Those ordinarily skilled in the art can understand that all or some of steps of the abovementioned method may be completed by the programs instructing the relevant hardware, and the abovementioned programs may be stored in a computer-readable storage medium, such as read only memory, magnetic or optical disk. Alternatively, all or some of the steps of the abovementioned embodiments may also be implemented by using one or more integrated circuits. Accordingly, each module/unit in the abovementioned embodiments may be realized in a form of hardware, or in a form of software function modules. The present document is not limited to any specific form of hardware and software combinations.
It should be noted that the present document may have a variety of other embodiments, and without departing from the spirit and essence of the present document, a person skilled in the art can make various corresponding changes and modifications according to the present document, and these corresponding changes and modifications should belong to the protection scope of the appended claims of the present document.
The abovementioned technical solution establishes a relationship between the broadcast voice parameters and the voice parameters input by a user through default policy, thus avoiding the inadequacy resulted from that the broadcast voice parameters use fixed data and do not consider the user's voice characteristics; in addition, the operation of the abovementioned technical solution generating the broadcast voice parameters does not require human intervention, thus providing users with a convenient use.
Number | Date | Country | Kind |
---|---|---|---|
2012 1 0385273 | Oct 2012 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2013/077225 | 6/14/2013 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/182085 | 12/12/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5889223 | Matsumoto | Mar 1999 | A |
6498834 | Sera | Dec 2002 | B1 |
6813606 | Ueyama | Nov 2004 | B2 |
6879954 | Nguyen et al. | Apr 2005 | B2 |
20030014246 | Choi | Jan 2003 | A1 |
20050086055 | Sakai | Apr 2005 | A1 |
20060080105 | Lee | Apr 2006 | A1 |
20060184370 | Kwak et al. | Aug 2006 | A1 |
20070168189 | Tamura et al. | Jul 2007 | A1 |
20110238495 | Kang | Sep 2011 | A1 |
20120095767 | Hirose et al. | Apr 2012 | A1 |
20120295572 | Park | Nov 2012 | A1 |
20130005295 | Park | Jan 2013 | A1 |
20130019013 | Rice | Jan 2013 | A1 |
20130019018 | Rice | Jan 2013 | A1 |
20130019282 | Rice | Jan 2013 | A1 |
Number | Date | Country |
---|---|---|
1811911 | Aug 2006 | CN |
102004624 | Apr 2011 | CN |
102237082 | Nov 2011 | CN |
2004-198456 | Jul 2004 | JP |
2008018653 | Feb 2008 | WO |
Entry |
---|
Supplementary European Search Report for EP Applciation No. 13799812.6 mailed Oct. 30, 2015 (8 pages). |
Rentzos et al., “Voice Conversion Through Transformation of Spectral and Intonation Features,” Acoustics, Speech, and Signal Processing, 2004 Proceedings, IEEE International Conference on Montreal, Quebec, Canada, pp. 21-24. |
Styliano, “Voice Transformation: A Survey,” Acoustics, Speech and Signal Processing, 2009, IEEE International Conference, pp. 3585-3588. |
PCT International Search Report for PCT Application No. PCT/Cn2013/077225 mailed Sep. 19, 2013 (6 pages). |
Number | Date | Country | |
---|---|---|---|
20150262579 A1 | Sep 2015 | US |