In recent years the use of mobile computing applications in the form of human-computer interaction such as mobile communications has become ubiquitous. Mobile computing typically refers to the human-computer interaction by which a computer is expected to be transported during normal usage. The mobile computing applications allow communication between agents or devices in real-time (or near real-time) by way of a single modality or mode of operation. For example, mobile computing applications typically provide for communications by way of an audio (or voice) mode or by way of text messaging mode.
Some mobile computing applications include speech synthesis, which is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer or a text-to-speech (TTS) system. Similarly, a speech-to-text (STT) system processes speech and converts the speech to text output. TTS systems and STT systems can be implemented in software or hardware. Typical TTS systems convert normal language text into speech or render symbolic linguistic representations like phonetic transcriptions into speech. The synthesized speech can be generated by, for example, concatenating pieces of pre-recorded speech that are stored in a database. The synthesized (or generated) speech can then be provided to a user via a “synthetic” voice output.
At present, current systems do not provide end-to-end solutions for providing verified and authenticated multi-mode communication between agents or devices. For example, a first agent using via a text-based messaging mode cannot effectively communicate with a second agent using a voice-based messaging mode. Further, the current systems do not synthesize and/or otherwise generate customized speech according to a particular individual or source device. Accordingly, the need exists for a system that overcomes the above problems, as well as one that provides additional benefits.
Overall, the examples herein of some prior or related systems and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems will become apparent to those of skill in the art upon reading the following.
Examples of end-to-end solutions that can facilitate customized and verified multi-mode communication between agents or devices utilizing mobile computing applications are illustrated in the figures. The examples and figures are illustrative rather than limiting.
A system is described for providing customized and verified multi-mode communication between agents or devices. The communication may be any type of communication that is generated by a source agent and presented to or otherwise made available to a recipient agent, such as a voice call, a video call, an instant messaging session, etc. The agents can concurrently utilize different modes of communication, such as voice-based (or audio) communication, video-based communication, text-based communication, etc. For example, a source agent can communicate with a recipient agent via a text-based communication mode while the recipient agent concurrently communicates with the source agent via a voice-based communication mode.
In order to provide the appropriate communication to a recipient, the multi-mode communication is transformed into the appropriate mode for the recipient agent. For example, a communication may need to be transformed from text to audio or from audio to text. Speech synthesis may be used to perform the text to audio transformation. The speech synthesis systems disclosed herein provide naturalness and intelligibility of a generated voice. The systems can be adjusted to meet particular requirements such as speech utterance.
Additionally, in some embodiments, the systems synthesize and/or otherwise generate or provide customized speech according to a particular individual or user of a source device. The customized speech may be generated to reflect particular styles. For example, speech communication can be thought of as comprising two channels—the words themselves, and the style in which the words are spoken. Each of these two channels carries information. A single, fixed style of delivery, independent of the message does not reflect substantial information attributed to the context of the message. For example, urgency and/or level of importance of a communication or message is not typically provided when generating a voice utilizing a single, fixed style. However, as disclosed herein, any number of speaking/declaration styles can be used to convey the communication. In addition, the system may generate various paralinguistic events such as confident voice, sighs, pauses, etc., which further enrich and personalize the generated communication.
The generated (or synthesized) speech may provide a level of confidence to a recipient agent as to the identity of the particular individual or user of the source device. The systems and methods disclosed herein may incorporate steganography within the communications or data streams in order to provide for additional security in some embodiments. For example, hidden messages can be incorporated into the multi-mode communications or data streams. The hidden messages can be used by recipient agents and/or other systems to confirm the authenticity and/or the identity of a source agent. The creation of stealthy channels for voice over Internet Protocol (VoIP) can be used based on the exploitation of free or unused fields of communication protocols such as Internet Protocol (IP), UDP, TCP, Real-Time control Protocol (RTCP), etc. These protocols utilize header fields that may be free and/or padding fields that may be unused. In some embodiments, it is possible to alternatively or additionally use encryption algorithms to secure the communications.
Various examples or implementations of the invention will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One skilled in the relevant art will understand, however, that the invention may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that the invention may include many other obvious features not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description.
The terminology used below is to be interpreted in its broadest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific examples of the invention. Indeed, certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section.
The environment 100 facilitates customized communication between the source agent and the recipient agent via different modes of operation. For example, a source agent can communicate with a recipient agent via a text-based communication mode while the recipient agent concurrently communicates with the source agent via a voice-based communication mode. As shown in
The example environment 100 provides for authentication of the agents and/or transformation of the communications between the agents. Elements of the authentication and/or transformation, or the authentication and/or transformation in their entirety, may be performed by the source agent 10, the recipient agent 20, and/or the host system 50. Accordingly, in some embodiments, the host system 50 is configured to communicate with the source agent 10 and/or the recipient agent 20 in order to perform the authentication and/or the transformation. Representative examples of the various signaling diagrams associated with the authentication and/or transformation at the source agent, the host system, and/or a recipient agent are discussed in greater detail with reference to
In one embodiment, prior to the source agent 10 exchanging communications with the recipient agent 20, the host system 50 interacts with a source agent 10 to facilitate voice training. The voice training may include one or more templates (or training sequences) and/or various learning or neural network systems. The voice training templates may be stored within one or more of the multiple servers 40 and data repositories 30. However, in some embodiments, it is appreciated that a voice training software package may also be downloaded to the source agent 10. In this case, the voice training software package is executed from the source agent 10.
In one embodiment, training sequences may be provided to the source agent 10 so that the host system can learn or otherwise ascertain various aspects of the source user's voice and language usage patterns such as, for example, the various tones and paralinguistic events that a source user typically utilizes in verbal communications with others. The tones and/or paralinguistic events, among other aspects of the source user's voice, can be stored by the host system for later use in the custom voice synthesis. Learning and storing the various aspects of the source user's voice for generating a synthesized version of the user's voice is discussed in greater detail with reference to
The source agent 10 and the recipient agent 20 can be any systems and/or devices, and/or any combination of devices/systems that are able to establish a connection, including wired, wireless, cellular connections with another agent or device, server, and/or other systems such as host system 50. The source agent 10 and the recipient agent 20 will typically include a display and/or other output functionalities to present information (i.e., communications) and data exchanged between among the agents 10 and 20 and/or the host system 50.
The source agent 10 and the recipient agent 20 can include mobile, hand held or portable devices or non-portable devices and can be any of, but not limited to, plain old telephone systems (POTS phones), global positioning system (GPS) devices, Voice over Internet Protocol (VoIP) phones, in-vehicle computers, vehicle tracking systems, server desktops, computer clusters, gaming systems, or portable devices including, notebook computers, laptop computers, handheld computers, palmtop computers, mobile phones, cell phones, smart phones, PDAs, Blackberry devices, Treo devices, handheld tablets (e.g. an iPad, a Galaxy, Xoom Tablet, etc.), tablet PCs, thin-client devices, hand held consoles, hand held gaming devices or consoles, iPhones, and/or any other portable, mobile, hand held devices, etc. running on any platform or any operating system (e.g., Mac-based OS (OS X, iOS, etc.), Windows-based OS (Windows Mobile, Windows 7, etc.), Android, Blackberry OS, Embedded Linux platforms, Palm OS, or Symbian platform.
As shown, the source agent 10 and the recipient agent 20 include respective user interfaces 14 and 24. The user interfaces 14 and 24 may be used by source device user 12 and recipient device user 22 to interact with the source agent 10 and the recipient agent 20, respectively. The source agent 10 and the recipient agent 20 can also include various input mechanisms (not shown). For example, the input mechanisms can include touch screen keypad (including single touch, multi-touch, gesture sensing in 2D or 3D, etc.), a physical keypad, a mouse, a pointer, a track pad, motion detector (e.g., including 1-axis, 2-axis, 3-axis accelerometer, etc.), a light sensor, capacitance sensor, resistance sensor, temperature sensor, proximity sensor, a piezoelectric device, device orientation detector (e.g., electronic compass, tilt sensor, rotation sensor, gyroscope, accelerometer), and/or combinations or variations of the above.
The telecommunication network 60 may be any type of cellular, IP-based or converged telecommunications network, including but not limited to Global System for Mobile Communications (GSM), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), Orthogonal Frequency Division Multiple Access (OFDM), General Packet Radio Service (GPRS), Enhanced Data GSM Environment (EDGE), Advanced Mobile Phone System (AMPS), Worldwide Interoperability for Microwave Access (WiMAX), Universal Mobile Telecommunications System (UMTS), Evolution-Data Optimized (EVDO), Long Term Evolution (LTE), Ultra Mobile Broadband (UMB), Voice over Internet Protocol (VoIP), Unlicensed Mobile Access (UMA), etc.
The network 60 can be any collection of distinct networks operating wholly or partially in conjunction to provide connectivity to the host system 50 and agents (e.g., source agent 10 and recipient agent 20). In one embodiment, communications to and from the agents and/or to and from the host system 50 can be achieved by, an open network, such as the Internet, or a private network, such as an intranet and/or the extranet.
The agents and host system 50 can be coupled to the network 60 (e.g., Internet) via a dial-up connection, a digital subscriber loop (DSL, ADSL), cable modem, wireless connections, and/or other types of connection. Thus, the agents can communicate with remote servers (e.g., the host system 50, etc.) that provide access to user interfaces of the World Wide Web via a web browser, for example.
The databases 30 can be implemented via object-oriented technology and/or via text files, and can be managed by a distributed database management system, an object-oriented database management system (OODBMS) (e.g., ConceptBase, FastDB Main Memory Database Management System, JDOlnstruments, ObjectDB, etc.), an object-relational database management system (ORDBMS) (e.g., Informix, OpenLink Virtuoso, VMDS, etc.), a file system, and/or any other convenient or known database management package. As shown, the databases 30 are coupled to (or otherwise included within host server 50). However, it is appreciated that in some embodiments, the databases 30 may be alternatively or additionally directly coupled to network 60 and/or distributed across multiple systems.
In some embodiments, a set of data or a profile related to an agent may be stored in the databases 30. The profile may include associated transformation files for customized transformation of communications from one communications mode to another as described herein and/or specific language usage patterns for particular users. An example of the type of information that may be stored in databases 30 is illustrated and discussed in greater detail with reference to
The host system 200, although illustrated as comprised of distributed components (physically distributed and/or functionally distributed), could be implemented as a collective element. In some embodiments, some or all of the modules, and/or the functions represented by each of the modules can be combined in any convenient or known manner. Moreover, the functions represented by the modules can be implemented individually or in any combination thereof, partially or wholly, in hardware, software, or a combination of hardware and software. It is appreciated that some or all of the host system 200 modules described with respect to
In the example of
In the example of
In the example of
One embodiment of the host server 200 includes the verification module 210. The verification module 210 can be any combination of software agents and/or hardware components able to manage, authenticate users (e.g., agents) and/or communications, and/or register users associated with host server 200. In this example, the verification module 210 includes a registration engine 212, an encryption/decryption engine 214, and an authentication engine 216. Additional or fewer engines are possible.
The registration engine 212 is configured to register new agents (and/or agent device users) with the system and/or create new accounts with the host server 200. During registration the user can provide login credentials, etc. The registration engine 212 is configured to register the agents that attempt to interact with the host server 200. In some embodiments, registration engine 212 also verifies agents attempting to login to the host system by associating a user's username and password with an existing user account. Unauthorized agents can be directed to register with the host system 200.
The encryption/decryption engine 214 is configured to encrypt and/or decrypt communications or aid agents in the encryption/decryption process. Encryption can provide an additional level of security. For example, a communication from a source agent that is registered with the host system may be received and encrypted by the encryption/decryption engine 214. Alternatively or additionally, the host system 200 may aid a source device in the encryption process at the source device. In either case, the encrypted communication is subsequently transferred to a recipient agent where it is decrypted. In some cases, the encryption/decryption engine 214 may aid the recipient device in decrypting the communication.
The authentication engine 216 authenticates communications between agents to ensure that the communications are verified and/or otherwise secure. The authentication may be used in lieu of or in addition to encryption. The authentication engine 216 may make use of stealthy channels (e.g., steganography) by utilizing and/or exploiting free or unused fields of communication protocols such as, for example, Internet Protocol (IP), UDP, TCP, Real-Time Control Protocol (RCP), etc. Other protocols are also possible. By way of example, a global unique identifier (GUID) or other type of digital signature or certificate may be appended somewhere in the file (e.g., within the free or unused fields or elsewhere in the file). The digital certificate could be a fingerprint identifying the source or the recipient agent or could be a random or pseudo random sequence.
Alternatively or additionally, identification codes may be used for specific recipient agents that can be changed or modified in accordance with an algorithm. The identification codes may represent a unique signature or fingerprint. As discussed, the use of steganography can include incorporation of hidden messages in data streams (or communications) between agents. Accordingly, the agents may subsequently confirm or reject the identity of the other party in real-time (or near-real time). This provides additional security for agents.
The use of security and/or the type of security utilized by the agents and/or the host system can be based on the specific communication scenario. For example, some communications may be marked by an agent or a host system as containing sensitive information. These communications may utilize the highest levels of security including authentication and/or the encryption described herein. Alternatively, other communications may be relatively unimportant and thus not include encryption or authentication.
The Secure Sockets Layer (SSL) and/or voice tunneling protocols may be utilized for communications between agents and/or between an agent and the host system. SSL is the standard security technology for establishing an encrypted link between a web server and a browser. This link ensures that all data passed between the web server and browsers remain private and integral. SSL is an industry standard and is used by millions of websites in the protection of their online transactions with their customers.
One embodiment of the host server 200 includes the transformation module 220. The transformation module 220 can be any combination of software agents and/or hardware components able to transform and/or otherwise convert communications from one mode to another mode. In this example, the transformation module 220 includes a voice-to-text engine 222 and a text-to-voice engine 224 (e.g., voice synthesis engine). Additional or fewer engines are possible.
The voice-to-text engine 222 is configured to convert voice-based communications to text-based communications. For example, the voice-to-text engine 222 may access voice-to-text transformation files in the transformation files database 245. The voice-to-text engine 222 may process the content of the voice communication to identify an appropriate tone for the generated text-based communication. The tone and/or other aspects of the voice-based communication or paralinguistic events included in the voice-based communication may be used to determine special characters, shorthand, emoticons, etc., that might be used and included in a generated text-based communication.
The appropriate tone for a generated text-based communication may be based on the tone of the voice communication and/or other aspects or paralinguistic events embedded in the voice communication. Alternatively or additionally, the appropriate tone for the generated text-based communication may be based on the recipient user (or recipient agent). For example, if the recipient user is determined or otherwise identified to be a “friend” of the source user, then the selected tone of the text message may be a friendly tone. In addition, the voice-to-text engine 222 may be more inclined to use shorthand, slang, emoticons, etc. Alternatively, if the recipient user is determined to be the source user's boss, then the selected tone may be more formal. In this case, the voice-to-text engine 222 may be less inclined to use shorthand, slang, emoticons, etc. The voice-to-text engine 222 is configured to transform or otherwise convert communications from a voice-based communication mode to a text-based communication mode. The voice-to-text engine 222 may provide automatic voice recognition and transformation into one or more text files. The text file(s) can be, for example, one or more SMS messages intended for a recipient agent. Alternatively or additionally, the text file(s) may be used to generate SMS messages for a recipient agent.
The text-to-voice engine 224 (voice synthesizer) is configured to convert text-based communications to voice-based communications. In particular, the text-to-voice engine 224 is configured to synthesize, replicate and/or otherwise generate mechanical or customized speech according to a particular individual or agent user. For example, the text-to-voice engine 224 may access voice-to-text transformation files associated with a particular user in the transformation files database 245 in order to synthesize and/or otherwise generate the customized voice communications according to a particular individual or user of a source device.
The customized speech may be generated to reflect particular styles. For example, speech communication can be though of as comprising two channels—the words themselves, and the style in which the words are spoken. Each of these two channels carries information. A single, fixed style of delivery, independent of the message does not reflect substantial information attributed to the context of the message. For example, urgency and/or level of importance of a communication or message is not typically provided when generating a voice utilizing a single, fixed style. However, as disclosed herein, any number of speaking/declaration styles can be used to convey the communication. In addition, the system may generate various paralinguistic events such as confident voice, sighs, pauses, etc., which further enrich and personalize the generated communication.
One embodiment of the host server 200 includes the voice training module 230. The voice training module 230 can be any combination of software agents and/or hardware components able to learn, track, train, and/or otherwise capture an agent user's voice including learning, tracking, and/or otherwise capturing segments of a user's voice, tendencies during communications, traits, and/or other voice characteristics of an agent user. In this example, the training module 230 includes a voice template engine 232 and a training engine 234. Additional or fewer engines are possible.
The voice template engine 232 is configured to provide various voice templates or training sequences to a source agent and/or a recipient agent in order to facilitate voice training for use with the generation of customized speech for particular individuals or agent users.
The training engine 234 facilitates training and capture of a particular agent user's voices. For example, the training engine 234 may utilize a training period and tune the acoustical files to match the voice of a particular speaker. This process may be facilitated through the use of the training sequences provided by the voice template engine 232. In one example, the training engine 234 learns, tracks, trains, and/or otherwise captures voice segments that are received or from the user's voice and/or synthesized based on the particular user's voice. The voice segments may be received in response to the training sequence and/or during tracking of other conversions. The transformation files are subsequently stored in the database 245 and later used by the voice-to-text engine 222 and the text-to-voice engine 224 to generate text-based communications and synthesize the user's voice for voice-based communications, respectively.
One embodiment of the host server 200 includes a mode detection module 240. The mode detection module 240 can be any combination of software agents and/or hardware components able to automatically detect the communication mode of a source device and/or a recipient device. For example, the mode detection module 240 may be in communication with a source agent and/or a recipient agent that keep the mode detection module 240 apprised of their respective current preferred modes of communication. Alternatively or additionally, the mode detection module 240 may automatically detect the mode of communication based on conditions and/or preferences previously setup by the user, historical observations, locations of the user, etc.
One embodiment of the host server 200 includes a recipient identification module 250. The recipient identification module 250 can be any combination of software agents and/or hardware components able to identify the recipient of a communication. The recipient identification module 250 identifies a recipient based on information provided in the communication such as, for example, the recipient email address, telephone number, etc.
One embodiment of the host server 200 includes a web server module 260. The web server module 260 can be any combination of software agents and/or hardware components able to interact with the agents that have logged in or otherwise accessed or interacted with the host server 200. The web server module 260 provides access to agents via a web interface, and may interact with a browser (e.g. as a browser plugin).
Referring first to
In the example of
As shown, the software 316 includes a transformation module 317, a verification module 318, and downloaded transformation files 319. The transformation module 317 may individually or in combination with the one or more transformation server(s) 352 transform the text-based communication from the text-based source mode to the voice-based recipient mode. The functionality of the transformation module 317 may be similar to the functionality of transformation module 220 of
The network environment 300 includes a cellular network 361, a private network 362, one or more gateways 363, an IP network 364, a host system 350, and one or more access points 326. The cellular network 361, private network 362, and IP network 364 can be any telephony core networks as generally defined by 3GPP and 3GPP2 standards and organizations based on IETF Internet protocols. The networks 361-364 may be together the network 60 of
As shown in the example of
According to various aspects of this invention, the source agent 310 and the recipient agent 320 can be configured to operate in a variety of communication modes. For example, if the source device user 312 is in a meeting, then the user can set the source agent 310 to operate in a text-only communication mode. Similarly, the recipient agent 320 can be configured to operate in a voice-only communication mode (e.g., a “hands free” mode) if the recipient device user 322 is unable to communicate via text-based communications. In some cases, the source agent 310 and or the recipient agent manually set a mode of communication. In other cases, the mode of communication may be detected by the agent or the host system (e.g., the mode detection server(s) 358). The detection may be based on any number of factors including environmental factors. For example, if the recipient agent device 320 detects that the recipient device user is moving (e.g., driving, etc.), then the recipient agent device 320 may automatically set the recipient communication mode to a voice-only communication mode. Alternatively, the mode of communication of an agent may be fixed based on the type of agent. For example, an in-vehicle agent or a VoIP phone may be voice only agents in some embodiments.
In another example, an agent can detect the mode of communication based on environmental factors, configuration settings, historical data, and/or other factors. For example, an in-vehicle computer agent can be configured to be operated in a (fixed) voice-only communication mode when in motion and either a text-based communication mode or a voice-based communication mode when stationary (or when the engine is off).
In the example of .” The source device user 312 may be, for example, a husband sending a text-based communication to his wife, the recipient device user 322. In this example, the recipient device user 322 is operating a recipient agent 320 in a voice-based communication mode. In particular, in the example of
In one example of operation, the transformation from the text-based communication mode to the voice-based communication mode is performed at the source agent device by the source agent device. In this example, the source agent is configured to transmit the voice communication (or voice file) for delivery to the recipient agent 320. An example illustrating representative messaging used by a source agent to transform communications at the source agent with transform files downloaded from the host system is shown and discussed in greater detail with reference to
In another example of operation, the transformation from the text-based communication mode to the voice-based communication mode is performed in the cloud by the host system 350. In this example, the host system is configured to transmit the voice communication (or voice file) back to the source agent 310 for delivery to the recipient agent 320. Alternatively, the host system may transmit the voice communication for delivery to the recipient agent 320 directly (e.g., without sending the voice communication back to the source agent 310). Examples illustrating representative messaging used by a host system to transform communications in the cloud are shown and discussed in greater detail with reference to
The transformation server(s) 352 and/or the transformation module 317 may, individually or in combination, transform the text-based communication to a voice-based communication using the transformation files 345 and/or the source transformation files 319. The transformation from the text-based communication to the voice-based communication may include selecting an appropriate tone based on the recipient. For example, the recipient user may be the wife of the source user and thus a “loving” tone for the synthesized voice communication is selected. Selection of a tone during the transformation process is discussed in greater detail with reference to
Various other parameters may be selected such as style, urgency, rate, paralinguistic events, etc. For example, the “ . . . ” as illustrated in text message 315 may be converted to a long pause by the transformation server(s) 352 and/or the transformation module 317. Additionally, the “” character may be used by the transformation server(s) 352 and/or the transformation module 317 to select a style (e.g., “good mood”). The various other parameters that may be utilized in the transformation of the text-based communication to a voice-based communication are discussed in greater detail with reference to
Referring now to
In the example of
In one example of operation, the recipient agent user 422 inputs instructions into a text interface 210. The instructions are fed into or otherwise input into an automated dispatcher system 365. The instructions or text-based communications are subsequently transferred to the host system 350 where they are transformed into a voice based communication. The voice that is mechanically generated or synthesized may be the voice of the recipient agent user 422. Receiving the voice of the recipient agent user 422 at the source agent 410 may provide a level of comfort to the source agent user 412. Additionally, any instructions can be provided to the source agent user 412 via voice communications allowing the source agent user 412 to continue to maintain attention to the act of driving the truck. System Functions and Data Structures
The example of
In a receiving stage, at step 510, the host system receives a communication from a source agent such as, for example, source agent 10 of
In an authentication stage, at step 512, the host system authenticates the source agent. The host system may authenticate the source agent to confirm that the communication is legitimate. As discussed above, the host system may authenticate the source agent using one or more free or unused fields of the communication protocol. For example, the communication protocol may be IP, UDP, TCP, RTCP, etc.
In an identification stage, at step 514, the host system identifies the recipient agent. The recipient agent may be identified based on any number of characteristics of the communication. For example, the recipient agent may be identified based on information contained within the communication such as, for example, the recipient email address, the recipient telephone number, etc. In a determination stage, at step 516, the host system determines the appropriate reception mode (i.e., the recipient reception mode) for the recipient agent.
In a determination stage, at step 518, the host system automatically determines if the communication is in the recipient reception mode. If so, then the host system delivers, transfers, transmits or otherwise provides the communication to the recipient agent (step 522). However, if the host system determines that the communication is not in the recipient reception mode, then in a transformation stage, at step 520, the host system transforms the communication from a source transmit mode to the recipient reception mode before providing the communication.
The example of
In a selection stage, at step 610, the host system selects a tone for the communication. A tone for the communication may be selected based on the recipient agent and/or the source agent. For example, if the source agent user is attempting to communicate with a friend then a “friendly” tone may be selected. Conversely, if the source agent user is attempting to communicate with his or her boss, then a “formal” tone may be selected. Selection of the various tones for a communication is discussed in greater detail with respect to table 700 of
In a selection stage, at step 612, the host system selects one or more styles associated with the communication. The one or more styles may be selected based on the content of the message and/or other indicators included within the communication. For example, styles may be selected based on specific keywords or phrases that are identified in a text-based communication or special characters such as emoticons. The styles that may be selected can be used to generate the unique voice for the recipient. The styles may include, but are not limited to, a question style, good/bad news style, neutral or no style, a style for showing contrastive emphasis, a style for conveying importance or urgency, etc. As discussed above, the various tones and/or styles can be developed using training sequences and/or built by the neural network system over time. Selection of the various styles for a communication is discussed in greater detail with respect to table 700 of
In an identification stage, at step 614, the host system identifies one or more appropriate paralinguistic events to utilize in connection with the communication. For example, a long sigh may be selected or otherwise identified in connection with a text-based communication including the phrase “ . . . ”. Identification of the various paralinguistic events for use with a communication is discussed in greater detail with respect to table 700 of
In an access stage, at step 616, the host system accesses source agent voice segments. In a selection stage, at step 618, the host system selects the appropriate agent voice segments in order to synthesize or otherwise generated the unique voice. The selection of the voice segments may be based on the tone and/or the one or more selected styles.
Lastly, in a generation stage, at step 620, the host system generates a source agent user's voice. The source agent user's voice may be generated using the various selected voice segments and/or the one or more paralinguistic events. For example, as shown in
In the example of
In the example of
The following signaling diagrams are discussed with respect to transformation a text-based communication to a voice-based communication. However, the systems and methods described herein are not limited to transforming text-based communications to voice-based communications. It is appreciated that various other modes of communication may be transformed. The diagrams of
As described herein, the host system may first train the source agent's individual voice. The host system generates and stores the transformation files based on the voice training. Alternatively, certain default files or files generated by the system may be provided. The source agent subsequently transfers a communication to the host system. For example, the communication may be a text-based communication.
In the example of
The recipient agent provides the voice-based communication to the recipient user. The recipient agent may subsequently receive a voice-based response from the recipient user. The voice-based response can be sent to the source agent where it is forwarded to the host system for transformation from the voice-based communication mode to a text-based communication mode. The transformed response communication is then sent to the source agent where it is provided to the source agent via the text-based communication mode.
Unless the context clearly requires otherwise, throughout the description and the claims, the words “comprise,” “comprising,” and the like are to be construed in an inclusive sense, as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.” Additionally, the words “herein,” “above,” “below,” and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of this application. Where the context permits, words in the above Detailed Description using the singular or plural number may also include the plural or singular number respectively. The word “or,” in reference to a list of two or more items, covers all of the following interpretations of the word: any of the items in the list, all of the items in the list, and any combination of the items in the list.
The above Detailed Description of examples of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed above. While specific examples for the invention are described above for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize. For example, while processes or blocks are presented in a given order, alternative implementations may perform routines having steps, or employ systems having blocks, in a different order, and some processes or blocks may be deleted, moved, added, subdivided, combined, and/or modified to provide alternative or subcombinations. Each of these processes or blocks may be implemented in a variety of different ways. Also, while processes or blocks are at times shown as being performed in series, these processes or blocks may instead be performed or implemented in parallel, or may be performed at different times.
The teachings of the invention provided herein can be applied to other systems, not necessarily the system described above. The elements and acts of the various examples described above can be combined to provide further implementations of the invention.
These and other changes can be made to the invention in light of the above Detailed Description. While the above description describes certain examples of the invention, and describes the best mode contemplated, no matter how detailed the above appears in text, the invention can be practiced in many ways. Details of the system may vary considerably in its specific implementation, while still being encompassed by the invention disclosed herein. As noted above, particular terminology used when describing certain features or aspects of the invention should not be taken to imply that the terminology is being redefined herein to be restricted to any specific characteristics, features, or aspects of the invention with which that terminology is associated. In general, the terms used in the following claims should not be construed to limit the invention to the specific examples disclosed in the specification, unless the above Detailed Description section explicitly defines such terms. Accordingly, the actual scope of the invention encompasses not only the disclosed examples, but also all equivalent ways of practicing or implementing the invention under the claims
While certain aspects of the invention are presented below in certain claim forms, the applicant contemplates the various aspects of the invention in any number of claim forms. For example, while only one aspect of the invention is recited as a means-plus-function claim under 35 U.S.C sec. 112, sixth paragraph, other aspects may likewise be embodied as a means-plus-function claim, or in other forms, such as being embodied in a computer-readable medium. (Any claims intended to be treated under 35 U.S.C. §112, ¶6 will begin with the words “means for”, but use of the term “for” in any other context is not intended to invoke treatment under 35 U.S.C. §112, ¶6.) Accordingly, the applicant reserves the right to add additional claims after filing the application to pursue such additional claim forms for other aspects of the invention.