Claims
- 1. A speech transcription tool comprising:
an audio classification component configured to receive an audio stream containing speech data and segment the audio stream into speech and non-speech audio segments based on locations of the speech data within the audio stream; control logic configured to playback the speech segments of the audio stream and to skip playback of the non-speech segments; and an input device configured to receive user transcription text relating to a transcription of the speech segments played by the control logic.
- 2. The speech transcription tool of claim 1, further comprising:
a graphical user interface that includes
a waveform section configured to display graphical representations of the audio stream, and a transcription section configured to display the user transcription text.
- 3. The speech transcription tool of claim 2, wherein the waveform section includes an indication of a current one of the speech segments being played back.
- 4. The speech transcription tool of claim 1, wherein the audio classification component additionally segments the audio stream based on whether the audio stream is a wideband or a narrowband audio stream
- 5. The speech transcription tool of claim 1, wherein the control logic advances to playing back a next speech segment of the speech segments based on a user command to skip to the next speech segment.
- 6. The speech transcription tool of claim 1, wherein the input device additionally receives user input commands that define formatting of the transcription text.
- 7. The speech transcription tool of claim 6, wherein the user input commands include predefined keystrokes that signify that a next word in the transcription text is a proper name.
- 8. The speech transcription tool of claim 1, wherein the control logic annotates the transcription text with meta-data that includes time codes referring to the audio stream.
- 9. A method comprising:
receiving an audio stream containing speech data; determining where the speech data is located in the audio stream; playing select portions of the audio stream to a user, the select portions of the audio stream being based on the location of the speech data; receiving text corresponding to the select portions of the audio stream; and outputting the text.
- 10. The method of claim 9, wherein the playing select portions of the audio stream to the user includes:
playing portions of the audio stream that contain the speech data.
- 11. The method of claim 10, wherein the playing select portions of the audio stream to the user includes:
skipping playing of portions of the audio stream that do not contain the speech data.
- 12. The method of claim 9, wherein outputting the text includes annotating the text with meta-data that includes time codes referring to the audio stream.
- 13. The method of claim 9, further including:
playing a next portion of the audio stream based on a user command.
- 14. The method of claim 9, wherein the select portions of the audio stream are based on a location of the speech data and on whether the audio stream is a wideband or narrowband audio stream.
- 15. The method of claim 9, further comprising:
receiving user commands indicating that a next word of the text is a proper name.
- 16. The method of claim 15, wherein outputting the text includes annotating the text with meta-data that indicates words that are proper names.
- 17. A method comprising:
analyzing a data stream based on acoustic characteristics of the data stream to generate acoustic classification information for the data stream; playing portions of the data stream that meet predetermined criteria based on the acoustic classification information; and receiving transcription information relating to the played portions of the data stream.
- 18. The method of claim 17, further comprising:
skipping to a next portion of the data stream based on a user command.
- 19. The method of claim 17, wherein the acoustic classification information delineates between speech and non-speech portions of the data stream.
- 20. The method of claim 19, wherein the predetermined criteria based on the acoustic classification information includes whether a portion of the data stream is speech data.
- 21. The method of claim 20, wherein the playing portions of the data stream includes:
skipping playing portions of the data stream that are non-speech portions.
- 22. The method of claim 17, further comprising:
receiving user commands indicating that a next word of the transcription information is a proper name.
- 23. The method of claim 22, further comprising:
outputting the transcription information with meta-data that indicates words that are proper names.
- 24. A computing device for transcribing an audio file that includes speech, the computing device comprising:
an audio output device; a processor; and a computer memory coupled to the processor and containing programming instructions that when executed by the processor cause the processor to:
automatically segment the audio file into speech and non-speech segments based on acoustic characteristics of the audio file; play a current one of the speech segments through the audio output device; receive transcription information for the speech segments played through the audio output device; and skip the non-speech segments when locating a next current one of the speech segments for playing through the audio output device.
- 25. The computing device of claim 24, further comprising:
a user input device for entering the transcription information.
- 26. The computing device of claim 24, wherein the user input device is further used to enter user commands that indicate that a next word of the transcription information is a proper name.
- 27. A device comprising:
means for analyzing a data stream based on acoustic characteristics of the data stream to generate acoustic classification information for the data stream; means for playing portions of the data stream that corresponds to speech data based on the acoustic classification information; and means for receiving transcription information corresponding to the played portions of the data stream.
- 28. A computer-readable medium containing program instructions for execution by a processor, the program instructions comprising:
instructions for analyzing a data stream based on acoustic characteristics of the data stream to generate acoustic classification information for the data stream; instructions for playing portions of the data stream that correspond to classification information that indicates that the portion of the data stream contains speech data; and instructions for receiving transcriptions of the played portions of the data stream.
RELATED APPLICATIONS
[0001] This application claims priority under 35 U.S.C. §119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082 filed Jul. 3, 2002 and Provisional Application No. 60/419,214 filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.
GOVERNMENT CONTRACT
[0002] The U.S. Government may have a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract No. 1999-S018900-0 (Federal Broadcast Information Service (FBIS)).
Provisional Applications (3)
|
Number |
Date |
Country |
|
60394064 |
Jul 2002 |
US |
|
60394082 |
Jul 2002 |
US |
|
60419214 |
Oct 2002 |
US |