Claims
- 1. An automatic speaker verification system comprising:
a receiver, the receiver obtaining enrollment speech over an enrollment channel; a means, connected to the receiver, for developing an estimate of the enrollment channel; a first storage device, connected to the receiver, for storing the enrollment channel estimate; a means for extracting predetermined features of the enrollment speech; a means, operably connected to the extracting means, for segmenting the predetermined features of the enrollment speech, wherein the features are segmented into a plurality of subwords; at least one classifier, connected to the segmenting means, wherein the classifier models the pluraility of subwords and outputs one or more classifier scores.
- 2. The automatic speaker verification system of claim 1, further comprising:
an analog to digital converter, connected to the receiver, for providing the obtained enrollment speech in a digital format.
- 3. The automatic speaker verification system of claim 1, wherein at least one classifier is a one neural tree network classifier.
- 4. The automatic speaker verification system of claim 1, wherein at least one classifier is a Gaussian mixture model classifier.
- 5. The automatic speaker verification system of claim 1, wherein the classifiers comprise:
at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score; and at least one neural tree network classifier, the neural tree network classifer resulting in a second classifier score.
- 6. The automatic speaker verification system of claim 1, further comprising a means, connected to the classifier, for fusing the classifier scores, wherein the fusing means weighs the scores from the classifier models with a fusion constant and combines the weighted scores resulting in a final score for the combined system.
- 7. The automatic speaker verification system of claim 6, wherein the weighted scores are variable and are dynamically adapted.
- 8. The automatic speaker verification system of claim 1, wherein the segmenting means generates subwords using automatic blind speech segmentation.
- 9. The automatic speaker verification system of claim 1, wherein the estimating means comprises a means for creating a filter representing characteristics of the enrollment channel.
- 10. The automatic speaker verification system of claim 1, further comprising a second storage device, connected to the classifier, for storing the one or more classifier scores.
- 11. An automatic speaker verification method, comprising the steps of:
obtaining enrollment speech over an enrollment channel; storing an estimate of the enrollment channel; extracting predetermined features of the enrollment speech; segmenting the enrollment speech, wherein the enrollment speech is segmented into a plurality of subwords; and modelling the pluraility of subwords using one or more classifier models resulting in an output of one of more classifier scores.
- 12. The automatic speaker verification method of claim 11, further comprising the steps of:
digitizing the obtained enrollment speech; and preprocessing the digitized enrollment speech.
- 13. The automatic speaker verification method of claim 11, wherein the step of modeling comprises the step of scoring at least one neural tree network classifier.
- 14. The automatic speaker verification method of claim 11, wherein the step of modeling further comprises the steps of:
scoring at least one Gaussian mixture model classifier, the Gaussian mixture model classifer resulting in a first classifier score; scoring at least one neural tree network classifier, the Gaussian mixture model classifer resulting in a second classifier score; fusing the first and second classifier scores.
- 15. The automatic speaker verification method of claim 11, further comprising the steps of:
weighing the scores from the classifier models with a fusion constant; and combining the weighted scores resulting in a final score for the combined system.
- 16. The automatic speaker verification method of claim 15, wherein the fusion constant is variable and is dynamically adapted.
- 17. The automatic speaker verification method of claim 11, wherein the step of segmenting comprises generating subwords using automatic blind speech segmentation.
- 18. The automatic speaker verification method of claim 11, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.
- 19. An automatic speaker verification method, comprising the steps of:
obtaining enrollment speech over an enrollment channel; storing an estimate of the enrollment channel, the estimate being a filter representing characteristics of the enrollment channel; receiving test speech over a testing channel; inverse filtering the test speech to create filtered test speech; recalling the estimate of the enrollment channel filtering the filtered test speech through the recalled estimate of the enrollment channel to create enrollment filtered test speech; and determining whether the enrollment filtered test speech comes from the same person as the enrollment speech.
- 20. The automatic speaker verification method of claim 19, wherein the step of storing an estimate of the enrollment channel comprises the step of creating a filter representing characteristics of the enrollment channel.
- 21. The automatic speaker verification method of claim 19, wherein the step of inverse filtering the test speech comprises the step of creating a filter representing inverse characteristics of the testing channel.
- 22. An automatic speaker verification method, comprising the steps of:
obtaining enrollment speech over an enrollment channel; inverse filtering the enrollment speech to create inverse filtered enrollment speech; receiving test speech over a testing channel; inverse filtering the test speech to create inverse filtered test speech; and determining whether the inverse filtered test speech comes from the same person as the inverse filtered enrollment speech.
- 23. The automatic speaker verification method of claim 22, wherein the step of inverse filtering the enrollment speech comprises the step of creating a filter representing inverse characteristics of the enrollment channel.
- 24. The automatic speaker verification method of claim 22, wherein the step of inverse filtering the test speech comprises the step of creating a filter representing inverse characteristics of the testing channel.
- 25. An automatic speaker verification method, including the steps of:
obtaining two or more samples of enrollment speech; processing each sample of enrollment speech to form corresponding utterances; obtaining test speech; identifying one or more key words/key phrases in the test speech, including the steps of: selecting a reference utterance from one of the utterances; warping the remaining samples of the enrollment speech to the reference utterance; averaging one or more of the warped utterances to generate a reference template; calculating a dynamic time warp distortion for the reference template and test speech; and choosing a portion of the test utterance which has the least dynamic time warp distortion; and comparing the identified key word/key phrases to the enrollment speech to determine whether the test speech and enrollment speech are from the same person.
- 26. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of: choosing the utterance with minimum duration.
- 27. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of: choosing an utterance with median duration.
- 28. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of: choosing an utterance with a duration closest to the average duration.
- 29. The automatic speaker verification method of claim 25, wherein the step of selecting a reference utterance comprises the step of: choosing an utterance with minimum combined distortion with respect to the other utterances.
- 30. An automatic speaker verification method, wherein the results of prior verifications are stored, including the steps of:
obtaining test speech from a user seeking authorization or identification; generating subwords of the test speech; scoring the subwords against subwords of a known individual using a plurality of modeling classifiers; storing the results of each model classifiers as a classifier score; fusing the results of each classifier score using a fusion constant and weighing function to generate a final score; and comparing final score to a threshold value to determine whether the test speech and enrollment speech are from the known individual.
- 31. The automatic speaker verification method of claim 30, further comprising the step of:
determining that fusion adaptation inclusion criteria are met; and changing the fusion constant to provide more weight to the classifier score which more accurately corresponds to the threshold value.
- 32. The automatic speaker verification method of claim 30, further comprising the steps of:
determining that model adaptation inclusion criteria are met, including that one or more verifications have been successful; and training the model classifiers with previously stored enrollment speech and with speech corresponding to the successful verifications, including the steps of
generating a new threshold value; and storing the new threshold value.
- 33. The automatic speaker verification method of claim 30, further comprising the steps of:
determining that threshold adaptation inclusion criteria are met; analyzing the stored final scores; calculating a new threshold value in response to the analyzation; and storing the new threshold value.
- 34. An automatic speaker verification method, comprising the steps of:
obtaining test speech from a user over a test channel; processing the test speech to remove the effects of the test channel; and comparing the processed test speech with speech data from a known user, including the steps of:
extracting features of the test speech; generating subwords based on the extracted features; scoring the subwords using one or more model classifiers; fusing the results of the model classifiers to obtain a final score; and verifying the user if the final score is equal to or greater than a threshold value.
- 35. The automatic speaker verification method of claim 34, wherein the known speech is obtained over an enrollment channel, wherein the step of processing further comprises the step of filtering the test speech through a filter having characteristics of the enrollment channel, and wherein the step of generating subwords further comprises the step of spotting one or more key words/key phrases in the processed test speech.
- 36. The automatic speaker verification method of claim 34, further comprising the steps of:
training the model classifiers using antispeaker data from nonusers and one or more enrollment speech samples from the user; changing the model classifiers and threshold value, including the step of: determining that the user has been verified; retraining the model classifiers, including the step of using test speech corresponding the verified final score as an enrollment sample; calculating a new threshold value based on the retrained model classifiers.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority from Provisional Application 60/031,639, filed Nov. 22, 1996, entitled Voice Print System.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60031639 |
Nov 1996 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
08976280 |
Nov 1997 |
US |
Child |
10042832 |
Jan 2002 |
US |