The present invention relates generally to a method of transmitting voice data in a communication network and a method of receiving voice data in a communication network. Further, the invention relates to a device for receiving voice data in a communication network, and to a device for transmitting voice data in a communication network. In particular, the invention relates to communication devices, such as telephones, cellular phones, walkie-talkies, computers and the like.
At present, a range of techniques exists which enable the communication of two or more persons over a communication network. Examples of communication devices used with these techniques include cellular phones, fixed-line phones, voice over IP telephones which may be implemented on a personal computer, walkie-talkies and other communication devices. It is also possible to establish connections between different kinds of devices, such as initiating a call from a soft phone to a cellular phone. When two persons are communicating over a communication network, they generally desire to know the identity of the person they are communicating with. At present, a system is available called caller line identification presentation (CLIP), which is for example implemented in integrated services digital network (ISDN) communication devices. Such a device is capable of displaying the telephone number of a telephone from which the device is called on a display. The identification number or telephone number is transmitted on a separate signaling channel using a session control protocol (for example SS7). Using this method, only the identification number is transmitted, and no further information on the person initiating the call or the person receiving the call is available to the other person.
Similarly, in cellular phones the CLI of a caller may be transmitted using a call setup request (for example in GSM/3G networks). By simply displaying the CLI of the caller, it is generally not possible for the user of the called device to identify the person calling.
A further frequently used application of communication networks is the possibility of holding a telephone conference. As a plurality of persons may participate in a telephone conference, it is particularly useful for a participant to know the identities of the other participants. In particular, it is desirable to known the identity of the person presently speaking. This is particularly true as it is often very difficult for a participant to identify the other persons simply by recognizing their voice. A conventional method for identifying a person currently speaking in a telephone conference uses voice recognition, wherein it compares a voice sample of the person with the current speech. Yet such a method requires a rather complex and computationally expensive system, and further requires the participants to provide voice samples.
A similar problem arises in ad-hoc voice sessions, wherein a plurality of persons may communicate over a single communication channel, and accordingly, it is difficult for a participant of the session to identify a person presently speaking. In particular, when a new participant enters such a session, he can not be easily identified. At present, no easy to implement methods exist for identifying a current speaker when using one of the above-mentioned communication devices. Accordingly, a need exists for providing a possibility of identifying a speaker or subscriber when communicating over a communication network. In particular, there is the need for an easy to implement and cost effective method for enabling speaker identification. It is further desirable to provide the appropriate equipment for implementing such a method.
The present invention provides a method, a device and a processor readable medium for transmitting voice data in a communication network.
According to a first aspect of the invention, a method of transmitting voice data in a communication network comprises retrieving identification information identifying a subscriber of a transmitting device. A data packet comprising voice data and said identification information is created. By means of said transmitting device, the data packet is transmitted to a receiving device. The data packet is such that by receiving the data packet, the receiving device is enabled to provide the identification information to a user of the receiving device. As voice data and identification information are transmitted within a data packet, both voice data and identification information are available to the receiving device.
According to an embodiment of the invention, the identification information may comprise a name, an e-mail address, a cellular phone number, a fixed-line phone number, a mobile subscriber integrated services digital network number (MSISDN), a caller line identification (CLI), a voice over internet protocol (VIP) user identification or the like, or any combination thereof. There are several possibilities of including the identification information in a data packet. Examples are attaching the identification information in the form of an identification frame to the end of the data packet or to comprise the identification information in a data section of a voice frame comprising the voice data. The identification information comprised in the data packet may for example have a length of between 8 and 64 bytes. It may also have a length of between 16 and 32 bytes.
It should be understood that the data packet may be a media packet in general and may as such comprise data others than voice data. It may for example be a video data packet comprising video data, which may include said voice data. In other embodiments, the packet may mainly comprise voice data.
According to a further embodiment, the data packet is a real-time transport protocol (RTP) data packet. Such a data packet may be transmitted by using a transport protocol such as the user datagram protocol (UDP). As an example, the voice data may be comprised in a data packet in the form of at least one voice frame. Such a voice frame may be encoded by using an adaptive multirate (AMR) or adaptive multirate wideband (AMR-WB) format. Yet other formats may also be used, such as pulse code modulation (PCM). The method is generally applicable to all kinds of codecs, including lossless audio codecs, such as PCM, A-law/mu-law (G.711), compressed PCM formats and the like, and lossy audio codecs, such as AMR, iLBC, MP3, AAC, and the like. Identification information may replace for example some audio information (e.g. in the high frequency spectrum), and audio quality may be comprised for identification information.
According to a further embodiment of the invention, a plurality of data packets may be transmitted by the transmitting device. The identification information may then be comprised in each of said data packets. Accordingly, the identification information is always available to the receiving device when receiving data packets comprising voice data. A data packet may for example be part of a media stream transmitted over a connection established for communications between at least the transmitting device and the receiving device. A connection may also be established for communications between more than these two devices. Examples of such a connection include a telephone connection, a telephone conference, a video conference, an ad-hoc voice session, voice over internet protocol (VoIP) telephony, or a two-way radio connection or the like. Communication devices connected over such a connection may both receive and transmit media streams comprising data packets. According to another embodiment, the method may further comprise the following steps at the receiving device: receiving the data packet, accessing the identification information comprised in said data packet and providing the identification information to a user of the receiving device. It is thus possible to inform the user of the receiving device of the identity of the subscriber as soon as the data packet with the voice data is received.
According to another aspect of the invention, a method of receiving voice data in a communication network comprises a step of receiving a data packet comprising voice data and identification information by a receiving device. The identification information comprised in the data packet is then accessed. The identification information is provided to a user of the receiving device, wherein the identification information identifies a subscriber of a transmitting device transmitting said data packet.
According to an embodiment, the identification information is displayed to the user of the receiving device. The identification information may be displayed while voice samples using said voice data is given out to the user in audible form. While giving out the voice, the receiving device may for example show who is speaking. As the voice data and the identification information are received in the same data packet, the identification information is generally available in the receiving device when voice is given out, even in a case where no prior exchange of information has occurred.
In another embodiment where text is transcribed from received voice data, the identification information may be displayed while the text is given out to the user, e.g. in readable form.
According to another embodiment of the invention, the method further comprises the steps of receiving a plurality of data packets comprising voice data and identification information as part of a media stream exchanged between the transmitting device and the receiving device, assembling the received voice data to a voice signal and giving out the voice signal over a loudspeaker. Not every data packet has to comprise both voice data and identification information. The transmitting device and the receiving device may both transmit and receive media streams. Data packets comprising voice data and identification information may also be received from plural transmitting devices. The identification information may then be provided to the user while the voice data received with the identification information is given out to the user. Even when communicating with plural persons, the user is provided with the information who is currently speaking.
According to another embodiment, the voice data and the identification information may be stored at the receiving device. Thus, a conversation log may be created, in which it is possible to identify the person from whom a communication originates.
According to yet another embodiment of the invention, the data packet is directly received from a transmitting device transmitting said data packet during an ad-hoc voice session. An ad-hoc voice session may comprise plural transmitting devices, and the identification of the users of these devices may thus be enabled. By receiving the identification information, the receiving device may further be enabled to initiate the establishment of a new connection with the transmitting device. Just as an example, the user of the receiving device may be enabled to establish a private connection to a transmitting device, the identification information of the subscriber of which was displayed on the receiving device.
According to another aspect of the invention, a device for transmitting voice data in a communication network is provided, the device comprising a memory in which identification information identifying a subscriber of said device is stored and a processing unit for creating a data packet comprising voice data and identification information. The device further comprises a transmitting unit for transmitting said data packet to a receiving device. The data packet is such that by receiving said data packet, the receiving device is enabled to provide the identification information to a user of the receiving device.
According to an embodiment, the device further comprises a microphone for recording a voice signal. The processing unit may generate the voice data from at least a part of said voice signal. The processing unit may for example be designed so as to create the data packets in a real-time transport protocol format. The data packet may comprise at least one adaptive multirate (AMR) or adaptive multirate wideband (AMR-WB) encoded voice frame and said identification information. A data packet may also comprise plural voice frames, and the identification information may be part of one or of plural of these voice frames, or included in another part of the packet. Other embodiments may use other types of transport protocols and other types of encoding.
According to an embodiment, the device may be implemented as a cellular phone, a fixed-line phone, an internet protocol (IP) telephone, a soft phone, a teleconference system, a video teleconference (VTC) system, a video telephone or a telecommunication system.
According to a further aspect of the invention, a device for receiving voice data in a communication network is provided. The device comprises a receiving unit receiving a data packet, said data packet comprising voice data and identification information. The identification information identifies a subscriber of a transmitting device transmitting said data packets. The device further comprises a processing unit accessing the identification information comprised in said received data packet and an output unit for providing the identification information to a user of said device. By receiving the data packet with the voice data, the device is thus enabled to provide a user with the identification information.
According to an embodiment, the processing unit is designed so as to compose a voice signal of said voice data. The device may further comprise a voice output unit for giving out the voice signal. The voice signal is given out while the identification information received with the voice data is provided to the user. The voice output unit may have the form of a loudspeaker, earphones, a headset or the like.
According to another embodiment, the receiving unit is designed so as to receive data packets from plural transmitting devices. The identification information received from a transmitting device may then be provided to the user while the voice data received from the same transmitting device is given out to the user. If for example used in a telephone conference, wherein voice data is received from plural participating devices, the device may then display the identification information of the person presently speaking.
According to a further embodiment, the device may further comprise a memory for storing the voice data and the identification information. The device is thus enabled to generate a conversation log.
According to another embodiment, the device for receiving voice data in a communication network may be implemented in the form of a cellular phone, a fixed-line phone, an internet protocol (IP) telephone, a soft phone, a teleconference system, a video teleconference (VTC) system, a video telephone, or a telecommunication system.
According to a further aspect of the invention, a device for transmitting voice data in a communication network and a device for receiving voice data in a communication network may be implemented within one device.
In accordance with another embodiment of the invention, a processor readable medium having computer-executive instructions for receiving or transmitting voice data in a communication network is provided. The computer-executable instructions, when executed by a processor unit of a corresponding device, may perform any of the above-described methods.
Features of the above embodiments and aspects of the invention may be combined to form new embodiments.
The foregoing and other features and advantages of the invention will become further apparent from the following detailed description of illustrative embodiments when read in conjunction with the drawings.
Embodiments of the present invention are illustrated by the accompanying figures, wherein:
In the figures, like reference symbols indicate like elements.
The term data packet as used in the disclosure is to be understood in its most general meaning. It may for example be a block of data, possibly comprising a header and/or trailer, yet it may also be a sub-unit of a data stream, such as a data frame. A data frame may also comprise a header and/or trailer. A packet may also be a data frame. A packet may for example be an asynchronous transfer mode (ATM) cell, an IP packet, a datagram of a user datagram protocol (UDP) service, a voice frame of an adaptive multirate (AMR) encoded voice signal, a real-time transport protocol (RTP) packet and the like. It may also be a data frame of a digital telecommunications system, a frame of a time division multiplexing (TDM) system or a frame in a pulse code modulation (PCM) system.
The embodiment of
The communication device 100 comprises a processing unit in the form of a microprocessor 101 interfacing several components of the communication device by means of input/output unit 102. The exchange of control signals or data between the components may be achieved by a bus system (not shown). The microprocessor 101 can control the operation of the device 100 according to programs stored in memory 103. Microprocessor 101 may be implemented as a single microprocessor or as multiple microprocessors, in the form of a general purpose or special purpose microprocessor, or of one or more digital signal processors. Memory 103 may comprise all forms of memory, such as random access memory (RAM), read-only memory (ROM), non-volatile memory such as EPROM or EEPROM, flash memory or a hard drive. Some of these types of memory may be removable from the communication device 100, e.g. flash memory cards, while others may be integrated for example with microprocessor 101. Memory 103 stores identification information identifying the subscriber using device 100.
The communication device 100 comprises a transceiver 104. The transceiver 104 comprises a transmitting unit 105 and a receiving unit 106 for communication over a communication network. The transmitting unit and the receiving unit may be integrated within one or more integrated circuits. In the embodiment of
Communication device 100 further comprises a user interface 108. The user interface 108 comprises a keypad 109 by which a user may enter information and operate the communication device 100. Keypad 109 comprises alphanumeric keys, as well as control elements such as turn/push buttons, rockers, joystick-like control elements and similar. Keypad 109 may for example be used to enter a telephone number or user ID for establishing a connection via transceiver 104. Display 110 interfaces input/output unit 102 and is used to provide information to a user of communication device 100. Displayed information may comprise a function menu, service information, contact information characters entered by keypad 109 and it may in particular display information received by transceiver 104 such as identification information received with a data packet. In some embodiments, display 110 is implemented as a simple single line LCD display, yet in other embodiments, display 110 may be a full size LCD screen.
User interface 108 further comprises microphone 111 and loudspeaker 112. For voice communication over a communication network, such as an integrated services digital network (ISDN), or a GSM network, microphone 111 records the voice of a user of communication device 100, whereas loudspeaker 112 is an output unit reproducing a received voice signal. Audio processing unit 113 converts a received digital voice signal to an analog voice signal and interfaces loudspeaker 112 for giving the voice signal out. Audio processing unit 113 further interfaces microphone 111 for digitizing a recorded analog voice signal and for providing the digitized voice signal via input/output unit 102 to microprocessor 101 for further processing.
Communication device 100 may be implemented as a telephone to be connected to a digital telephone network. Yet it may also be implemented as a cellular telephone, a stand-alone IP phone, a soft phone, or a digital two-way radio. Communication device 100 may also comprise further components such as a video camera for enabling video telephony. Communication device 100 may connect to a plurality of communication networks, such as a telephone network, an integrated services digital network, a GSM or UMTS network, a local area network (LAN), a wireless local area network (WLAN), or the internet. Further implementations of device 100 comprise a personal data assistant, a wireless hand-held device or a walkie-talkie.
After the communication connection is established, the voice of the user of the communication device 100 is recorded by microphone 111 in step 202. The recorded voice is digitized using audio processing unit 113 and temporarily stored in memory 103. In step 203, identification information of the subscriber using the communication device 100 is retrieved. The identification information may be the name of the subscriber, an identification number of the subscriber, such as the telephone number or CLI, or MSISDN number, yet it may also comprise other information, such as an e-mail address or a voice over internet protocol (VoIP) user identification. The identification information is stored in memory 103, for example in a non-volatile memory, and is accessed by microprocessor 101.
In step 204, data packets are created. For creating the data packets, processor 101 accesses the digitized voice signal stored temporarily in memory 103. Microprocessor 101 encodes the voice signal using an audicodec, such as adaptive multirate (AMR) or adaptive multirate wideband (AMR-WB). The voice signal may for example be sampled at 8000 Hz, and may be encoded into 20 ms speech frames each comprising 160 samples. As said before, other codecs may be used, e.g. PCM, A-law/mu-law, iLBC, MP3, AAC, compressed PCM and the like, which may use different frame sizes and sampling rates. One or more AMR speech frames may then be included into an RTP packet. Furthermore, the retrieved identification information is included into the RTP packet. Possible implementations for generating an RTP packet as described are shown in
Another implementation is shown in
Referring back to
In a next step 307, data packets are received from a second communication device. As in step 303, the identification information and voice data comprised in the data packets are extracted (step 308). The voice signal is assembled in step 309. The voice signal is given out in step 310, while the identification information of the second participant is displayed in step 311. As an example, the display of the receiving device may display both the names of the first and the second participants, and may highlight the name of the participant currently speaking. Even without being able to distinguish their voices, the user of the receiving device is now enabled to identify the speaking person. Furthermore, she/he is directly provided with the names of the other participants. This is particularly useful for telephone conferences, in which the participants do not already know each other. Furthermore, it is possible to transmit further identification information, such as an e-mail address or identification numbers. The identification information may be provided in a form in which it can be directly used, e.g. implemented in an address book or the like.
In a further step 312, the voice data and the identification information are stored. As the identification information is received together with the voice data, the voice data can be stored in association with the correct identification information, i.e. the name of the participant, and accordingly, a conversation log can be generated. It is thus possible to directly identify the originator of the particular contribution. The voice data and the identification information may for example be stored in a non-volatile memory comprised in memory 103.
It should be clear that not all steps in
Some situations may arise where the device transmitting the data packets is operating according to the method of
As can be seen from the above description, the proposed method has several advantages over prior methods of exchanging voice data in a communication network. Even without any prior information exchanged, participants of a communication session implementing the above-described methods are enabled to identify the other participants. On the terminus of the participants, real-time updates of the person currently speaking can be provided. The methods are easy to implement, and do not require extensive processing of the speech signals, nor the exchange of additional control protocols.
Furthermore, the method of the invention may be applied to a push to talk over cellular application, which can be implemented on cellular phone networks such as GSM, GPRS, EDGE, CDMA or UMTS. A push-to-talk application may again use an RTP protocol and AMR encoding for the transmission of voice frames. The connection may again be set up using the SIP protocol, the connection being packet switched. By using push to talk, an active talk group can be reached by the push of a button. As a plurality of participants may communicate within a push-to-talk session, it is advantageous to implement the method of the present invention, as the person receiving the push-to-talk message can immediately identify the person currently speaking.
In the embodiment of
Devices 419 and 420 may also be implemented as cellular phones establishing a direct connection. Cellular phones may be equipped with a direct talk feature and may be able to provide a “walkie-talkie” service. When using such a walkie-talkie service, media streams are again exchanged on a common broadcast channel. The media streams carry packets or frames comprising voice data and identification information. The device receiving such a data packet or data frame is again enabled to display the identification information to its user. In such an ad-hoc voice session, it is therefore also possible to provide a participant with the identity information of the other participants. In particular, it is possible for a participant to see who is currently broadcasting, i.e. speaking, over the common broadcast channel.
A further embodiment of a method according to the invention is schematically illustrate in
While specific embodiments of the invention are disclosed herein, various changes and modifications can be made without departing from the spirit and the scope of the invention. The present embodiments are to be considered in all respects as illustrative and non-restrictive, and all changes coming within the meaning and equivalency range of the appended claims are intended to be embraced therein.