Speech to Text Conversion in a Videoconference

Information

  • Patent Application
  • 20070188599
  • Publication Number
    20070188599
  • Date Filed
    January 19, 2007
    17 years ago
  • Date Published
    August 16, 2007
    17 years ago
Abstract
Various embodiments of a method for automatically converting audio speech in a videoconference into text information are described. According to one embodiment of the method, a videoconferencing device at a first endpoint in the videoconference may receive a stream of video information and audio information from a videoconferencing device at a second endpoint in the videoconference. The audio information includes speech of a participant at the second endpoint. The videoconferencing device at the first endpoint may automatically convert the speech into text information.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the present invention may be obtained when the following detailed description is considered in conjunction with the following drawings, in which:



FIG. 1 is a diagram illustrating an embodiment of a videoconference;



FIG. 2 illustrates an embodiment of a videoconferencing system including a videoconferencing device;



FIG. 3 is a flowchart diagram illustrating an embodiment of a method for displaying and/or sharing participant information for participants in a videoconference;



FIG. 4 illustrates an embodiment of a display in which an image of a participant is displayed together with participant information for the participant;



FIG. 5 illustrates an embodiment in which a videoconferencing device at a remote endpoint sends both video information and participant information to a videoconferencing device at a local endpoint;



FIG. 6 illustrates an embodiment in which a local videoconferencing device at a local endpoint receives video information from a remote videoconferencing device at a remote endpoint and receives participant information from a database;



FIG. 7 illustrates an embodiment display in which a callout box is displayed proximally to each participant on a display screen, where each callout box displays a name of the respective participant;



FIG. 8 illustrates an embodiment display in which multiple portions of participant information are displayed simultaneously with images of different participants;



FIG. 9 is a flowchart diagram illustrating an embodiment of a method for pre-storing participant information in a database;



FIG. 10 is a flowchart diagram illustrating an embodiment of a method for looking up the previously stored participant information for participants in a videoconference;



FIG. 11 illustrates an embodiment in which a videoconference participant carries a badge or card that stores the participant's identity information;



FIGS. 12-14 illustrate several exemplary implementations of a database in which participant information for participants may be stored;



FIG. 15 is a flowchart diagram illustrating an embodiment of a method for correlating the participant information for various participants with the images of the participants displayed on the display screen;



FIG. 16 is a flowchart diagram illustrating an embodiment of a method for verifying the identity of a potential participant in a videoconference using facial recognition;



FIGS. 17 and 18 illustrate an embodiment of a computer system for performing a facial recognition algorithm;



FIG. 19 is a flowchart diagram illustrating an embodiment of a method for automatically converting audio speech of a participant in a videoconference into text information;



FIG. 20 illustrates components in an exemplary videoconferencing device according to an embodiment; and



FIGS. 21A-21D illustrate exemplary hardware components for a videoconferencing device, according to an embodiment.


Claims
  • 1. A method for processing audio information in a videoconference, comprising: a first videoconferencing device in the videoconference receiving video information and audio information from a second videoconferencing device in the videoconference, wherein the audio information includes speech of a videoconference participant; andat least one of the first videoconferencing device or the second videoconferencing device: automatically converting the speech into text information; andstoring the text information in a memory.
  • 2. The method of claim 1, wherein said automatically converting the speech into text information comprises dynamically converting the speech into text information as the audio information is received by the first or second videoconferencing device.
  • 3. The method of claim 1, further comprising: creating one or more files; andstoring the text information in the one or more files, wherein the one or more files provide a transcript of the speech of at least one participant in the videoconference.
  • 4. The method of claim 1, further comprising displaying the text information on a display screen at the first or second videoconferencing device.
  • 5. The method of claim 4, wherein said displaying the text information on the display screen comprises displaying the text information substantially simultaneously with the video information.
  • 6. The method of claim 1, further comprising: performing voice recognition to identify a participant based on speech from the participant; andassociating the text information with the participant in response to said identifying the participant based on the speech of the participant.
  • 7. The method of claim 6, wherein said associating the text information with the participant comprises including a name of the participant in the text information.
  • 8. The method of claim 6, wherein said associating the text information with the participant comprises displaying the text information proximally to an image of the participant.
  • 9. The method of claim 1, wherein the speech of the participant is in a first language;wherein said converting the speech into text information comprises converting the speech into first text information in the first language;wherein the method further comprises automatically translating the first text information into second text information, wherein the second text information is in a second language.
  • 10. The method of claim 9, further comprising displaying the second text information in the second language on a display screen.
  • 11. A videoconferencing device, comprising: an input port operable to receive video information and audio information, wherein the audio information includes speech of a participant in a videoconference;a memory; andone or more computational elements operable to: automatically convert the speech into text information; andstore the text information in the memory.
  • 12. The videoconferencing device of claim 11, wherein the one or more computational elements include at least one processor.
  • 13. The videoconferencing device of claim 11, wherein said automatically converting the speech into text information comprises dynamically converting the speech into text information as the audio information is received.
  • 14. The videoconferencing device of claim 11, wherein the one or more computational elements are further operable to store the text information as a transcript of the speech of the participant in the videoconference.
  • 15. The videoconferencing device of claim 11, further comprising: an output port;wherein the one or more computational elements are further operable to: create a composite image including the video information and the text information; andsend the composite image to a display device via the output port.
  • 16. The videoconferencing device of claim 11, wherein the speech of the participant is in a first language;wherein said converting the speech into text information comprises converting the speech into first text information in the first language; andwherein the one or more computational elements are further operable to automatically translate the first text information into second text information, wherein the second text information is in a second language.
  • 17. The videoconferencing device of claim 16, further comprising: an output port; andwherein the one or more computational elements are further operable to send the second text information in the second language for display on a display device.
  • 18. A method for processing audio information in a videoconference, the method comprising: a videoconferencing device in the videoconference receiving a stream of video information and audio information, wherein the audio information includes speech of a participant at the first endpoint;the videoconferencing device at the first endpoint automatically converting the speech into text information; andthe videoconferencing device at the first endpoint sending the video information, audio information, and text information to a videoconferencing device at a second endpoint in the videoconference.
  • 19. The method of claim 18, wherein the speech of the participant is in a first language;wherein said converting the speech into text information comprises converting the speech into first text information in the first language; andwherein the method further comprises automatically translating the first text information into second text information, wherein the second text information is in a second language.
  • 20. The method of claim 19, further comprising storing the text information as a transcript of the speech of the participant in the videoconference.
Provisional Applications (1)
Number Date Country
60761867 Jan 2006 US