1. Field of the Invention
The present general inventive concept relates to a system and method to use a telephone, such as a voice over Internet Protocol (VoIP) phone, and more particularly, to a system that is configured to provide speech to text capabilities.
2. Description of the Related Art
The use of and development of communications has grown nearly exponentially in recent years. The growth has been fueled by larger networks with more reliable protocols and better communications hardware available to service providers and consumers. Users have similarly grown to expect better communications with rapid access to information related to their communications. These heightened expectations are driven by the desire of users for new technology that provides increased efficiency and effectiveness.
While telephone users now expect clear audio signals so that they user can hear and understand the party with whom they are communicating, breakdowns in communication still occur. The breakdowns may result from a poor connection, poor communication skills, limits of telephone technology such as a user's inability to view the speaker during a telephone conversation, and the like.
For instance, one or more parties on a telephone or conference call may have a speech impediment, poor grasp of others' language, or does not speak others' language. Further, one or both of the calling parties may be in an environment that has excessive background noise that interferes with the ability to communicate satisfactorily.
The limits of phone technology are also problematic. For instance, if there are multiple participants during a conference call, a breakdown in communication may result from one or more participants' inability to distinguish one participant from another. This issue is especially problematic given the commonplace of conference calls in today's workplace.
Technology to address breakdowns in communicate has not significantly improved with changing technology. Equipping a user with an increased amount of information so that the user may better understand another party would enhance the user's ability to communicate with the other party.
To overcome communications problems during telephone calls, the principles of the present invention provide for converting speech to text during a telephone call and displaying the text for a party on the telephone call. The speech-to-text conversion may generate the same or different language as the speech. By converting and displaying the text, one or more parties on the telephone call may more easily understand other parties on the call and have a record of the conversation.
An embodiment of a system for providing speech transcription to a user during a telephone call may include a receiver configured to receive a telecommunications signal forming a telephone call. The telecommunications signal communicates speech data representative of words spoken by a telephone call participant. A processing unit may be in communication with the receiver and be configured to transcribe the speech data representative of words into text. A display unit may be in communication with the processing unit and be configured to display the text for a user during the telephone call.
An embodiment of a process for providing speech transcription to a user during a telephone call may include receiving a telecommunications signal forming a telephone call. The telecommunications signal communicates speech data representative of words. The speech data representative of words may be transcribed into text, and displayed for a user during the telephone call.
These and/or other aspects and utilities of the present general inventive concept will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
The personal computer 102 may be in communication with a network 106 to communicate with other telephones 108a-108n (collectively 108) using data packets 110 or other communications protocols, as understood in the art. In one embodiment, the network 106 is the Internet. In addition, the network 106 may include other telecommunications networks, such as mobile communications networks and public switched telephone network (PSTN).
In one embodiment, the personal computer 102 may be configured to transcribe speech during a call and display text representative of the speech on the personal computer 102. The application may provide a graphical user interface (GUI) 112 that includes a transcription region 114 and control region 116. The control region 116 may include one or more control elements 118a-118n that enable the user to selectably turn the transcription feature on and off, select a language from which the transcription is being performed, select a preestablished accent, for example. As shown in the transcription region 114, a telephone conversation is being transcribed. The transcribed conversation may be performed substantially real-time and enable the user to view the transcription during the conversation and store the transcribed conversation for later use.
Because the personal computer 102 (or other communications device) is capable of recording the telephone call, the user may be provided with recorder controls that enable the user to replay the recorded telephone call during the telephone call. By enabling a user to replay the telephone call during the telephone call, a user who is unable to understand the person with whom he or she is speaking due to a bad connection, accent of the other person, or otherwise, may simply rewind and play the portion of the conversation that he or she did not hear properly, thereby not having to ask the other person to restate what he or she said.
In the embodiment shown in
In one embodiment, the server 126 may be configured as a conference call system that enables two or more callers to perform a conference call by dialing into a telephone number that then connects the callers into a conference call that each caller may listen. The server 126 may enable one or more of the callers into the conference call to selectively turn on a transcription service to transcribe in a substantially real-time manner and communicate the transcription to the user(s) during the conference call. Each of the callers who receive the transcription may utilize the transcription to better follow along with the conference call and save the conference call transcription for later review. In one embodiment, the server 126 may be configured to identify each user through his or her speech “signature” and allow each user to identify or associate a name with each caller. So, for example, if three callers on the conference call are speaking, the server 126 may be configured to enable one or more of the callers to enter the names of each of the callers, and the server 126 may automatically identify and associate or tag the name of each of the callers with text transcribed from each of the respective callers.
A train conversion module 306 may be configured to enable a user to train the convert speech to text module 302 to improve accuracy of the transcriptions. The train conversion module 306 may be utilized to train the module 302 by one or more users. For example, if multiple people use a single telephone or on a conference call, then each user may train the system with his or her voice. In addition, the train conversion module 306 may be used by another user at a different location who calls into a user. The train conversion module 306 may be trained by requesting a user to speak specific words or phrases so that the system is more easily able to identify specific words spoken by the user, as understood in the art.
A speaker type selector module 308 may provide for preestablished types of speakers who fall into a certain category. For example, the speaker type selector module 308 may enable a user to identify speakers as Southern, Northeastern, Midwestern, or ones from different countries. For example, if a user is from India and speaks English with a certain accent, the system may be preprogrammed or pre-trained such that the accent is accommodated for a party who speaks English with an Indian accent and the system is better able to transcribe his or her speech. In addition, the speaker type selector module 308 may enable a user to specify demographics of one or more users. The demographics may include gender, age, race, country of origin, or any other demographic that may enable the convert speech to text module 302 to better transcribe each parties' speech.
A conference call speaker identifier module 310 may be configured to automatically identify which speaker is being transcribed, thereby identifying text being spoken by each speaker. In one embodiment, the conference call speaker identifier module 310 may be configured to recognize a speech pattern, such as a formant pattern of a speaker, where a formant is generally defined by three dominant tones in a speaker's voice. Thereafter, each time the convert speech to text module 302 is utilized to convert speech of a user into text, the text may be displayed in association with an indicia, such as “Speaker One.” An associate name with speaker module 312 may be configured to enable a user to enter a name that the conference call speaker identifier module 310 or other module may utilize to display a name (e.g., “Peter:”), rather than any other indicia (e.g., “Speaker One”).
A display GUI module 314 may be configured to display a graphical user interface (GUI) on a computing system or telephone, as shown in
A store transcription module 316 may be configured to store text transcribed from speech during a telephone call, as understood in the art. The stored transcription may be printed or otherwise utilized by a user thereafter.
A host conference call module 318 may be configured to enable multiple users call into a conference call, as understood in the art. One or more conference call participants may utilize the transcription and translation capabilities provided by the modules 300 during the conference call.
Although a few embodiments of the present general inventive concept have been illustrated and described, it will be appreciated by those skilled in the art that changes may be made in these exemplary embodiments without departing from the principles of the general inventive concept, the scope of which is defined in the appended claims and their equivalents.