This application claims the priority benefit of Taiwan applications serial No. 109125475, filed on Jul. 28, 2020. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of specification.
The invention relates to a voice recognition method and an electronic device using the same.
The development of voice recognition models usually require a large number of people providing voice to establish a voice database, which requires a lot of labor costs, and the trained voice model also needs to be registered by the user to perform a subsequent voice recognition. In addition, the verification accuracy of the experimental data of the sound model is often very different from the actual verification data. The actual accuracy is affected by the user's voice status, voice size, and environmental noise, which is a bottleneck in the accuracy of voice recognition.
According to the first aspect, a voice recognition method is provided. The voice recognition method includes: collecting a plurality of voice signals; extracting voiceprint features of each of the voice signals; performing a data process on the voiceprint features, to convert the voiceprint features into a N-dimensional matrix, and N is an integer greater than or equal to 2; performing a feature normalization process on the N-dimensional matrix to obtain a plurality of voiceprint data; classifying the voiceprint data to generate a clustering result; finding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to each of the centroid.
According to the first aspect, an electronic device is also provided. The electronic device includes a sound receiver and a processor. The sound receiver is used to collect a plurality of voice signals. The processor is electrically connected to the sound receiver. The processor is configured for extracting voiceprint features of each of the voice signals; performing a data process on the voiceprint features to convert the voiceprint features into a N-dimensional matrix, where N is an integer greater than or equal to 2; performing a feature normalization process on the N-dimensional matrix to obtain a plurality of voiceprint data; classifying the voiceprint data to generate a clustering result; and finding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to the centroid.
In summary, the user's voiceprint data is registered and accurately recognized through a small amount of voice signals, and the voiceprint data is classified and recognized through the voice signals of the user's actual using environment. Therefore, the problem of the process that needs users to provide voices to register and the problem of inconsistency between the experimental data and the actual verification data are solved, and then the accuracy of recognition is improved.
In an embodiment, the software architecture in the processor 14 is shown in
As shown in step S14, transmits the voiceprint features obtained by the voiceprint module 141 to the dimensionality reduction module 142. The dimensionality reduction module 142 performs a data process on the voiceprint features to arrange and to convert the voiceprint features of the user from the same microphone 12 into an N-dimensional matrix, where N is an integer greater than or equal to 2. In an embodiment, the dimensionality reduction module 142 uses a t-distributed stochastic neighbor embedding (t-SNE) method to perform a dimensionality reduction process to obtain the N-dimensional matrix correspondingly. In an embodiment, the N-dimensional matrix is a two-dimensional matrix or a matrix with more than two dimensions.
As shown in step S16, the normalization module 143 performs a feature normalization process on the N-dimensional matrix, to scale the voiceprint features proportionally to fall within a specific interval to obtain a plurality of the voiceprint data. In an embodiment, the normalization module 143 performs a feature standardization process using the methods such as a standardization method, a mean removal method, and a variance scaling method.
The voiceprint data processed by the normalization module 143 is transmitted to the classification algorithm module 144, as shown in step S18, the classification algorithm module 144 performs the step of classifying on the voiceprint data to dynamically adjust a classifying threshold value according to the voiceprint features, and generates a clustering result. The clustering result includes a plurality of clusters. The step of classifying all the voiceprint data in step S18 is further shown in step S181 to step S184 of
The schematic diagram of a slope curve in
Please refer to
In an embodiment, as shown in
In sum, a voice recognition method reducing the complexity of the user registration step, and learns the user's voiceprint characteristics through a small amount of voice signals from the user is provided. In the embodiments, a small amount of voice signals are used to register the user's voiceprint data and are accurately recognized, where reduces the complexity that a large number of voice signals needs to be collected in traditional methods. Furthermore, the problem of inconsistency between the experimental data and the actual verification data is solved. Since differences exist between the actual using voice and recorded voice in volume size, character, and environmental noise, the voice in user's actual environment is classified and recognized in the embodiments disclose herein, so as to solve the problem of the process that needs users to provide voices to register and the problem of inconsistency between the experimental data and the actual verification data.
In summary, the user's voiceprint data is registered and accurately recognized through a small amount of voice signals, and the voiceprint data is classified and recognized through the voice signals of the user's actual using environment. Therefore, the problem of the process that needs users to provide voices to register and the problem of inconsistency between the experimental data and the actual verification data are solved, and then the accuracy of recognition is improved.
Although the present invention has been described in considerable detail with reference to certain preferred embodiments thereof, the disclosure is not for limiting the scope. Persons having ordinary skill in the art may make various modifications and changes without departing from the scope. Therefore, the scope of the appended claims should not be limited to the description of the preferred embodiments described above.
Number | Date | Country | Kind |
---|---|---|---|
109125475 | Jul 2020 | TW | national |
Number | Name | Date | Kind |
---|---|---|---|
11189263 | Ma | Nov 2021 | B2 |
20160358599 | Wang | Dec 2016 | A1 |
20180144742 | Ye | May 2018 | A1 |
20190341055 | Krupka | Nov 2019 | A1 |
20200294509 | Cai | Sep 2020 | A1 |
20210326421 | Khoury | Oct 2021 | A1 |
20210390959 | Jain | Dec 2021 | A1 |
Number | Date | Country |
---|---|---|
102222500 | Oct 2011 | CN |
105989849 | Oct 2016 | CN |
108091323 | May 2018 | CN |
108763420 | Nov 2018 | CN |
109637547 | Apr 2019 | CN |
109785825 | May 2019 | CN |
109960799 | Jul 2019 | CN |
111009262 | Apr 2020 | CN |
3483761 | Oct 2017 | EP |
Entry |
---|
Lu et al., “SCAN: Learning Speaker Identity from Noisy Sensor Data,” 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2017, pp. 67-78. (Year: 2017). |
Lerato L, Niesler T (2015) Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering. Plos One 10(10): e0141756. doi:10.1371/journal.pone.0141756 (Year: 2015). |
English Machine Translation of CN 108091323-A (Year: 2018). |
Halkidi et al., (2001). On Clustering Validation Techniques. Journal of Intelligent Information Systems. 17. 10.1023/A: 1012801612483. (Year: 2001). |
Moreno, “Data Normalization with Pandas and Scikit-Learn,” Towardsdatascience.com, Jul. 20, 2020 (available at “https://towardsdatascience.com/data-normalization-with-pandas-and-scikit-learn-7c1cc6ed6475,” last accessed May 8, 2023) (Year: 2020). |
Number | Date | Country | |
---|---|---|---|
20220036902 A1 | Feb 2022 | US |