Claims
- 1. A speaker identification system, comprising:
an indexer configured to:
generate a plurality of speaker models, receive a plurality of audio segments, and identify speakers corresponding to the audio segments based on the speaker models, the indexer being unable to correctly identify at least one of the speakers, as an unidentified speaker, corresponding to the audio segments; and a server configured to:
receive, from a user, the name of the unidentified speaker, and provide the name of the unidentified speaker to the indexer for identification of the unidentified speaker in subsequent audio segments.
- 2. The system of claim 1, wherein the indexer is further configured to:
generate a new speaker model for the unidentified speaker based on an audio segment corresponding to the unidentified speaker.
- 3. The system of claim 1, wherein the indexer is further configured to:
generate labels for the audio segments, the labels being based on names of speakers that can be identified and gender of speakers that cannot be identified.
- 4. The system of claim 3, wherein when receiving the name of an unidentified speaker, the server is configured to:
present a document to the user, the document including a transcription of a plurality of the audio segments and the labels for the plurality of the audio segments, and receive, from the user, the name of one of the speakers that cannot be identified.
- 5. The system of claim 4, wherein when presenting a document, the server is further configured to:
provide audio data corresponding to at least one of the plurality of the audio segments to the user.
- 6. The system of claim 1, wherein the server is further configured to:
locate one or more additional audio segments from the unidentified speaker, and present the one or more additional audio segments to the user for confirmation that the one or more additional audio segments were produced by the unidentified speaker.
- 7. The system of claim 6, wherein when presenting the one or more additional audio segments, the server is configured to continue to present audio segments to the user for confirmation until at least four minutes of audio data is obtained.
- 8. The system of claim 6, wherein the unidentified speaker corresponds to at least one of the audio segments; and
wherein when locating one or more additional audio segments, the server is configured to:
find one or more additional audio segments similar to the at least one of the audio segments.
- 9. The system of claim 1, wherein the indexer is configured to:
fit audio data from the unidentified speaker to a speaker independent Gaussian mixture model using an expectation and maximization process, and generate a new speaker model for the unidentified speaker using a maximum a posteriori adaptation process.
- 10. The system of claim 1, wherein the unidentified speaker is a misidentified speaker of one of the audio segments; and
wherein when receiving the name, the server is configured to:
receive, from the user, a correct name of a speaker of the one of the audio segments.
- 11. The system of claim 10, wherein the indexer is further configured to:
identify one of the speaker models, as an identified speaker model, that corresponds to the one of the audio segments, and update a label associated with the identified speaker model to include the correct name of the speaker of the one of the audio segments.
- 12. The system of claim 10, wherein the server is further configured to:
locate one or more additional audio segments similar to the one of the audio segments, and present the one or more additional audio segments to the user for confirmation that the one or more additional audio segments were produced by the speaker of the one of the audio segments.
- 13. A speaker identification system, comprising:
means for generating a plurality of speaker models; means for receiving a plurality of audio segments; means for identifying speakers corresponding to the audio segments based on the speaker models, at least one of the audio segments being associated with an unidentified or misidentified speaker; means for labeling the audio segments with names of the speakers that can be identified; means for presenting a plurality of the audio segments, including the at least one of the audio segments, with the labels to a user; means for receiving, from the user, the name of the unidentified or misidentified speaker; and means for identifying the unidentified or misidentified speaker by name in future audio segments.
- 14. A method for providing speaker identification training, comprising:
generating a plurality of speaker models; receiving a plurality of audio segments; identifying speakers corresponding to the audio segments based on the speaker models, at least one of the audio segments being associated with an unidentified or misidentified speaker; presenting a plurality of the audio segments, including the at least one of the audio segments, to a user; receiving, from the user, the name of the unidentified or misidentified speaker; and identifying the unidentified or misidentified speaker by name for future audio segments.
- 15. The method of claim 14, wherein the unidentified or misidentified speaker is an unidentified speaker; and
wherein the method further comprises:
generating a new speaker model for the unidentified speaker based on the at least one of the audio segments.
- 16. The method of claim 14, further comprising:
generating labels for the audio segments, the labels being based on names of speakers that can be identified and gender of speakers that cannot be identified.
- 17. The method of claim 16, wherein the presenting a plurality of the audio segments includes:
providing a document to the user, the document including a transcription of the plurality of the audio segments and the labels for the plurality of the audio segments.
- 18. The method of claim 17, wherein the providing a document includes:
providing audio data corresponding to one or more of the plurality of the audio segments to the user.
- 19. The method of claim 14, further comprising:
locating one or more additional audio segments from the unidentified or misidentified speaker, and presenting the one or more additional audio segments to the user for confirmation that the one or more additional audio segments were produced by the unidentified or misidentified speaker.
- 20. The method of claim 19, wherein the presenting the one or more additional audio segments includes:
presenting audio segments to the user for confirmation until at least four minutes of audio data is obtained.
- 21. The method of claim 19, wherein the locating one or more additional audio segments includes:
finding one or more additional audio segments similar to the at least one of the audio segments.
- 22. The method of claim 14, wherein the unidentified or misidentified speaker is an unidentified speaker; and
wherein the method further comprises:
fitting audio data from the unidentified speaker to a speaker independent Gaussian mixture model using an expectation and maximization process; and generating a new speaker model for the unidentified speaker using a maximum a posteriori adaptation process.
- 23. The method of claim 14, wherein the unidentified or misidentified speaker is a misidentified speaker; and
wherein the receiving the name includes:
receiving, from the user, a correct name of a speaker of the at least one of the audio segments.
- 24. The method of claim 23, further comprising:
identifying one of the speaker models, as an identified speaker model, that corresponds to the at least one of the audio segments; and updating a label associated with the identified speaker model to include the correct name of the speaker of the at least one of the audio segments.
- 25. A computer-readable medium that stores instructions executable by one or more processors for speaker identification training by a speaker identification system, comprising:
instructions for generating a plurality of speaker models based on training data; instructions for presenting, to a user, audio segments for which no speakers can be identified from the speaker models; instructions for obtaining, from the user, a name of a speaker for at least one of the audio segments; instructions for generating a new speaker model for the speaker based on the at least one of the audio segments; and instructions for associating the name of the speaker with the new speaker model.
- 26. A speaker identification system, comprising:
an indexer configured to:
receive a plurality of speech segments, each of the speech segments being associated with a corresponding speaker, create a plurality of documents by transcribing the speech segments, identify names of the speakers corresponding to the speech segments, the indexer being unable to correctly identify names of at least one of the speakers corresponding to the speech segments, the speakers for which the indexer can correctly identify names being identified speakers and the speakers for which the indexer cannot correctly identify names being unidentified speakers; a database configured to store the documents; and a server configured to:
retrieve one or more of the documents from the database, present the one or more of the documents to a user, receive, from the user, a name for one of the unidentified speakers, and provide the name for the one of the unidentified speakers to the indexer for subsequent identification of speech segments from the one of the unidentified speakers.
RELATED APPLICATION
[0001] This application claims priority under 35 U.S.C. § 119 based on U.S. Provisional Application No. 60/419,214, filed Oct. 17, 2002, the disclosure of which is incorporated herein by reference.
[0002] This application is related to U.S. patent application, Ser. No. 10/______ (Docket No. 02-4042), entitled “Continuous Learning for Speech Recognition Systems,” filed concurrently herewith, and U.S. patent application, Ser. No. 10/610,533 (Docket No. 02-4046), entitled “Systems and Methods for Improving Recognition Results via User-Augmentation of a Database,” filed Jul. 2, 2003, the disclosures of which are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60419214 |
Oct 2002 |
US |