This application claims priority to People's Republic of China Patent Application No. 200810187735.4 entitled INSTANT COMMUNICATION METHOD, INSTANT COMMUNICATION SERVER, SPEECH SERVER AND SYSTEM THEREOF, filed Dec. 31, 2008 which is incorporated herein by reference for all purposes.
The present application relates to electronic communication, and in particular to message based communication.
The number of companies participating in Internet based e-commerce has been steadily increasing. There is growing competition among e-commerce websites. An important measure of website performance is its communication capabilities. Successful websites usually are capable of keeping the visitors engaged and providing easy communication with the visitors. Presently, many websites are configured with Web-based Instant Messaging (IM) software that allows users to communicate with the websites' support personnel. IM software allows people to identify online users and exchange information in real time.
Many website owners such as small businesses find it difficult to provide IM support since they are often unable to keep a dedicated website receptionist at the computer constantly. When customers make inquiries through the website, if the website owners cannot provide timely response, business opportunities may be lost.
Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.
The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.
Instant Messaging (IM) based communication with voice capability is described. As used herein, IM refers to a form of communication in which users communicate substantially in real-time. In some embodiments, text data inputted by a first user (e.g., a website visitor) using special instant message software. The text data is transformed into speech data, and sent to a preconfigured phone number to be played to a second user (e.g., a website owner). After listening to the speech data, the second user may respond directly by voice, and the voice data is received by the server and sent to the first user's terminal to be played. Therefore, the website owner can communicate with website visitors anytime and anywhere without relying on text based messaging, thereby improving customer service.
Using the visitor terminal, the website visitor inputs text data via an IM application 711 (e.g., a web browser based IM application (referred to as WebIM)) that is built in a webpage 71 of the website. IM application 711 sends the text data to a speech server 721 on server 72. The speech server 721 includes Text To Speech (TTS) software for transforming the text data into speech data. A variety of TTS software/engines may be used, for example ReadPlease®, ProVerbe Speech unit, TextAloud®, etc. On the speech server 721, a phone number associated with the website (e.g., the phone number of the website owner) is pre-configured and stored.
Once text data is received and transformed into speech, the speech server sends a communication request to the telecommunication server 73. In some embodiments, the communication request includes the preconfigured phone number and the speech data. The speech data may be formatted using MP3 or other audio format. In some embodiments, telecommunication server 73 supports Session Initiation Protocol (SIP) and/or Voice-Over-IP (VoIP), and has the capabilities to connect to the Public Switched Telephone Network (PSTN) and put calls through to a regular landline phone or mobile phone. In some embodiments, the telecommunication server by Yuantel of Beijing, China is used. When the call is put through, the speech data is played on a user terminal 708 associated with the phone number (such as a telephone) and the website owner is prompted to leave a voice message reply. The speech server 721 receives speech reply from the website owner made via user terminal 708 and forwards the reply to IM Application 711. IM application 711 displays a speech file storing the speech reply, and a player associated with the IM application (such as an audio player plug-in) allows the visitor to play the speech file. The visitor may enter additional text data via the IM application and the process is repeated.
At 108, a second set of speech data sent from the second user terminal in response to the first set of speech data is received. The speech server is configured to receive the second set of speech data in this example. At 110, the second set of speech data is sent to the first user terminal to be played. In various embodiments, the second set of speech data may be sent directly from the speech server to the first user terminal, or sent to the instant communication server then forwarded to be displayed in connection with the WebIM application executing in the first user terminal.
The process may be repeated as the first user continues to enter additional text data.
In some embodiments, software code for implementing WebIM functions is added to webpage HTML code of the website to facilitate data transmission between website visitors and a server serving the website. An IM client is installed at a server of the website owner, with which the website owner can exchange text messages with website visitors using the IM client. Additional code (such as JavaScript code) is added to the webpage code text data to be sent to the speech server, speech data to be sent by the speech server, and the received speech data to be sent to a player of a website visitor's terminal. The player may be an independent application or a plug-in that is a part of the WebIM application executing on the first user terminal.
In step 104, the speech server transforms the text data into first speech data. In this embodiment, the text data is transformed for example using a speech synthesis technique TTS (Text To Speech). Other speech synthesis technology can also be used to transform text data into speech data. TTS, also known as speech transformation technology, is developed to transform text information into audible sound information, and via computer speech synthesis, any text can be transformed into highly natural speech. The technology is well known in the art, and further description is omitted.
In step 106, sending the first speech data to the second user terminal corresponding to the preconfigured phone number includes:
The speech server sends the first speech data to a telecommunication server, the telecommunication server connects to the PSTN and dials the preconfigured phone number on receipt of the first speech data, and sends the first speech data when the call is put through; that is, after transforming the text data into the first speech data, the speech server sends a communication request to the telecommunication server, the telecommunication server searches for a corresponding phone number based on the content of the request, and sends the first speech data to the second user terminal corresponding to the phone number when the call is put through, the second user terminal may respond and return responding speech data to the speech server via the telecommunication server after hearing a sound indication.
In this example, during the session, the speech server is configured to obtain all speech data, and the telecommunication server is configured to collect the second set of speech data returned by the second user terminal. Specifically, the speech server collects a voice response made after the sound indication by the second user terminal, and sends the second speech data to the Web-based instant communication software on the website. The Web-based instant communication software displays a speech file, and the user can use player software to hear the message replied from the second user terminal.
In some embodiments, the preconfigured phone number includes multiple phone numbers, and various intelligent response schemes can be configured. For example, different phone numbers from sales people from different regions are preconfigured on the server. Based on the Internet Protocol (IP) address of the first user terminal from which the text data is sent, the location of the website visitor may be determined. Thus, the transformed voice message is sent to the phone number that corresponds to or is in close proximity to the region from which the text message originates. As another example, different phone numbers may configured for different time periods such that a landline office telephone is configured to receive the messages during working hours, and a mobile phone is called afterhours. These two examples are described in greater detail below.
If the preconfigured phone number includes multiple phone numbers, i.e., one website owner may be associated with several phone numbers, for example, a phone number of Beijing, a phone number of Shanghai and a phone number of Shenzhen, and the step of initiating a call with the preconfigured phone number includes: determining an IP address of the first user terminal, determining a region to which the first user terminal belongs, selecting a phone number of an appropriate region based on the determination, and initiating a call with the selected phone number.
For example, a Beijing company markets and sells a device on a website, and designates contacts A and B as local distributor in Shanghai and Shenzen, respectively. The Beijing company may provide a main contact number of the company, contact numbers of the local distributors with IP addresses of their regions in a database of the speech server. The speech server determines an IP address of a website visitor, and if it determines that the visitor is from Beijing, a preconfigured phone number of the Beijing company is dialed; if it determines that the visitor is from Shanghai, a phone number configured with the Shanghai distributor is dialed; and if it determines that the visitor is from Shenzhen, a phone number configured with the Shenzhen distributor is dialed.
Similarly, if the preconfigured phone number includes phone numbers of multiple time periods, the step of the speech server initiating a call with the preconfigured phone number includes: determining a time when the first user terminal sends the text data; selecting a phone number of a corresponding time period based on the determined time; and initiating a call with the selected phone number.
For example, the website owner may set phone numbers answered in different time periods, for example, a fixed phone number of the website owner may be called in working hours, a mobile phone number of the website owner may be called off-work.
At 305, an appropriate phone number is selected from a plurality of preconfigured phone numbers. The originating region of the text data, time of contact, and many other appropriate criteria can be used. At 306, the first set of speech data is sent to a second user terminal that is associated with the selected phone number.
At 308, a second set of speech data sent from the second user terminal in response to the first set of speech data is received. The speech server is configured to receive the second set of speech data in this example. At 310, the second set of speech data is sent to the first user terminal to be played. In various embodiments, the second set of speech data may be sent directly from the speech server to the first user terminal, or sent to the instant communication server then forwarded to the WebIM application executing in the first user terminal. In some embodiments, a data file icon or the like may be displayed in connection with an audio player program, and the reply speech data is played at the first user's option. In some embodiments, the data is directly played.
The process may be repeated as the first user continues to enter additional text data.
As can be seen, in the embodiment WebIM software can send text data inputted by the user to the speech server. The speech server transforms the text data into speech data, dials the preconfigured phone number, and plays the speech data when the phone number is put through. Then the person answering the call leaves a message after a sound indication. The message is transformed by the speech server into a speech file and sent to the WebIM to be played by a computer. The website owner may communicate with the visitor via a mobile phone or a fixed phone anytime and anywhere, which may improve the reception of Internet marketing, reduce prerequisite requirements for e-commerce; and connect the Internet and the telecommunication network.
For example, a typical small company sets up a website. The owner travels a lot and is thus unable to answer inquiries from the web in a timely manner. The website cannot keep its visitors engaged and potential business opportunities are often lost.
Using a system described above, the owner may bind his mobile phone number with the website, and receive visitors' inquiries by phone. For example, a visitor A visits the website and asks via WebIM: “What's the price for a rolling door?” In a traditional WebIM mode, no one would answer the visitor. But now the owner will receive a call and hear the speech “Visitor A says: What's the price for a rolling door, question mark, please leave a message for visitor A after the sound indication, beep”. The owner may respond with “Hello, the cost varies depending on size and quantity. Please leave your phone number, I'll call you back.” Then visitor A sees that the owner has sent a speech file on the WebIM, and can play back the message. The visitor may reply with his phone number. Then the owner receives another call and hears the phone number. Finally, the owner may call visitor A directly upon obtaining the phone number. A potential business opportunity is therefore captured.
Embodiments of the invention not only transform text data into speech data, but also allow website visitors and website owners to communicate freely from the separated Internet and telecommunication network.
By adding code in a webpage source file of the website, e.g. JavaScript code, the web page of the website not only has standard WebIM functions, but also has the function to transmit text data of the WebIM to a speech server and the function to receive speech files sent from the speech server.
For example, after a visitor inputs text data in the WebIM and sends it, the code transmits the text data to the speech server. A program on the speech server integrated with the speech synthesis technology such as TTS may transform the text data into speech, data. On the other hand, the speech server sends speech data directly to the WebIM if receiving the speech data from a phone.
The text data reception unit 51 sends text data subsequently sent by the first user terminal to the transformation unit 52, when receiving the text data subsequently sent via Web-based instant communication software by the first user terminal; and functions of the first speech data sending unit 53, the second speech data reception unit 54 and the second speech data sending unit 55 are performed subsequently.
The first speech data sending unit 53 includes: an initiation unit and a data sending unit. The initiation unit is adapted to initiate a call with the preconfigured phone number; the data sending unit is adapted to send the first speech data to a telecommunication server, and instruct the telecommunication server to play the first speech data when the call is put through; the second speech data reception unit is adapted to receive the second speech data returned from the second user terminal and collected by the communication server.
The speech server further includes: a pre-configuration unit, adapted to pre-bind a website with a phone number of the second user terminal, a plurality of phone numbers of the second user terminal of different regions, or phone numbers of the second user terminal of different time periods.
According to some embodiments, if the pre-configuration unit pre-binds the website with a plurality of phone numbers of the second user terminal of different regions, the second speech data sending unit further includes: a determination unit and a number sending unit. The determination unit is adapted to determine an IP address of the first user terminal, determine a region to which the first user terminal belongs, and select a phone number of a corresponding region from the plurality of phone numbers in the pre-configuration unit; the number sending unit is adapted to send the selected phone number to the initiation unit.
According to another preferred embodiment, if the pre-configuration unit pre-binds the website with phone numbers of the second user terminal of different time periods, the second speech data sending unit further includes: a determination unit, a selection unit and a number sending unit. The determination unit is adapted to determine a time when the first user terminal sends the text data; the selection unit is adapted to select a phone number of a corresponding time period in the pre-configuration unit based on the determined time; the number sending unit is adapted to send the selected number to the initiation unit.
It would be understood by those skilled in the art from the above descriptions of the embodiments that components of the above systems are described in units with different functions. The units described above can be implemented as software components executing on one or more general purpose processors, as hardware such as programmable logic devices and/or Application Specific Integrated Circuits designed to perform certain functions or a combination thereof In some embodiments, the units can be embodied by a form of software products which can be stored in a nonvolatile storage medium (such as optical disk, flash storage device, mobile hard disk, etc.), including a number of instructions for making a computer device (such as personal computers, servers, network equipments, etc.) implement the methods described in the embodiments of the present invention. The units may be implemented on a single device or distributed across multiple devices. The functions of the units may be merged into one another or further split into multiple sub-units. The contribution of the technical solution of the invention can be represented via software products, which may be stored in storage medium such as ROM/RAM, magnetic disk and optical discs, and includes several instructions to enable a computer device (e.g. personal computer, server, or network device) to perform a method described in an embodiment or part of an embodiment of the invention.
Preferred embodiments of the invention are described above. It should be noted that, various modification and alternations may be made by those skilled in the art without departing from the scope of the invention, which therefore should be included in the scope of the invention.
Although the foregoing embodiments have been described in some detail for purposes of clarity of understanding, the invention is not limited to the details provided. There are many alternative ways of implementing the invention. The disclosed embodiments are illustrative and not restrictive.
Number | Date | Country | Kind |
---|---|---|---|
200810187735.4 | Dec 2008 | CN | national |