This application is the US national phase of international application PCT/EP2004/050253, filed 4 Mar. 2004, which designated the U.S., the entire content of which is hereby incorporated by reference.
The technology relates to reducing the latency in Push to Talk services and in particular in so-called Push to Talk Over Cellular services.
Push to Talk is the generic name for a range of services which enable users of mobile wireless handsets to communicate with one another almost instantaneously and at the push of a button, or at least at the push of a small number of buttons. An industry grouping is in the process of standardizing a Push to Talk service for introduction into present and future cellular networks including GSM with packet data services and 3G. The service is known as “Push to talk Over Cellular” (PoC).
PoC makes use of the IP Multimedia Subsystem (IMS) standardized by the 3rd Generation Partnership Project to facilitate the introduction of advanced data services into cellular networks, and in particular of real-time multimedia services. The IMS relies upon the Session Initiation Protocol (SIP) which has been defined by the Internet Engineering Task Force (IETF) for the setting up and control of multimedia IP-based sessions.
The time between the SIP INVITE message being sent and the IMS receiving an acceptance from the called party can be as much as 3 seconds due to fundamental properties of the network (e.g. paging, Temporary Block Flow (TBF) establishment, etc). In order to speed up the initial connection process, the initiating subscriber is therefore able to start talking upon receipt by his terminal of the SIP 202 Accepted message from the IMS (usually signaled to the initiating subscriber by the playing of a tone or “beep” on his terminal), even though the called party has not yet accepted the session. The initial talk burst may be buffered by a PoC server within the network until such time as it receives the SIP 200 OK message from the peer terminal. When that message is received, the talk burst is immediately sent to the peer terminal. Nonetheless, the delay perceived by the called party remains significant and it is desirable to reduce the delay still further.
The inventor has recognised that the initiating subscriber is unlikely to begin talking for a short while after the tone has been played due both to the reaction time of the subscriber and to his/her “thinking time”. In the example of
According to a first aspect there is provided a method of processing user speech data for transmission to a participant or participants in a push to talk session over a communications network, the method comprising:
The technology described herein is particularly applicable to removing an initial period of silence from the initial speech burst provided by the initiating party of the push to talk session. This has the effect of reducing the delay between the generation of the speech burst by the initiating subscriber and the playing of the speech burst to the or each other participant.
Preferably, said communication network is a cellular telephone network and the push to talk service is a Push to talk Over Cellular (PoC) service.
Embodiments of the invention may comprise a step of analyzing the speech data to identify an initial period of silence. This step may be carried out at the terminal of the initiating party, at a node within the communication network, or at a receiving terminal. Similarly, the step of removing the detected period of silence from the transmitted speech data may be carried out at the terminal of the initiating party, at a node within the communication network, or at a receiving terminal. The network node is preferably within the IP Multimedia Subsystem (IMS) in the case where the communication network is a cellular telephone network and the push to talk service is a PoC service.
In the case where the steps of detecting and removing are done at the initiating party's terminal, the step of detecting may comprise analyzing the speech data during or following recording of the data at the terminal.
Certain example embodiments may comprises monitoring the audio level and commencing recording of the speech only when that level exceeds some predefined threshold. This step may be carried out at the terminal of the imitating party or at a server node within the communication network. In other embodiments of the invention, an initial period expected to contain silence is predefined, and the start of the speech data is clipped to remove the predefined period. The predefined period may be fixed, or may be adaptive based upon talk/usage patterns of the user.
The step of removing an initial period of silence from the speech data may be carried out in real-time, as the speech data is received, or may be carried out by post-processing stored or buffered speech data.
According to a second aspect there is provided a server node for use in a communication network offering a push to talk service to subscribers, the node comprising:
Preferably, said server node is arranged to be located within an IP Multimedia Subsystem of a cellular telephone communications network, the node having an interface to one or more Session Initiation Protocol (SIP) servers including a Serving Call Session Control Function (S-CSCF) server.
According to a third aspect there is provided a mobile terminal for use in a communication network offering a push to talk service to subscribers, the terminal comprising:
Preferably, said mobile terminal is a wireless terminal and the communication network is a cellular telephone network offering a Push to talk Over Cellular service.
The mobile terminal may be a terminal used by said terminal user, or may be another terminal participating in the session.
The delays inherent in establishing Push to talk Over Cellular (PoC) sessions have been described above with reference to
In a first example embodiment, a Media Resource Function (MRF) of the PoC server begins receiving an the initial speech burst, sent from the initiating subscriber's mobile terminal (UE#1) following initiation of the PoC session. This burst will include an initial period of silence or background noise which might for example last for 0.8 seconds, and will be transported from UE#1 to the PoC server in a number of Real Time Protocol (RTP) frames. The PoC server buffers the received speech data and awaits receipt of a SIP 200 OK message from the other participant(s) in the session. This may take from a few milliseconds to several seconds. During this time, the PoC server analyses the buffered data to determine the length of the initial silent period, and clips the data to remove that period once identified. Following receipt of the 200 OK message(s), the PoC server begins transmitting the clipped speech from the front of the buffer.
The signaling associated with this procedure is illustrated in
The process of determining the presence and duration of an initial silent period may be conducted at the PoC server by analyzing the volume of the received speech signal. When the volume exceeds some predefined threshold, it is assumed that the speech has started and the silent period ended. Of course, more sophisticated algorithms may be used. For example, the speech signal may be analyzed for the presence of patterns distinctive of speech, thereby preventing the presence of background noise from giving a false indication of speech. An alternative approach is to assume that speech cannot begin for some fixed period after the tone has sounded, e.g. 0.8 seconds, and to remove that period from the start of the speech burst. The length of this period may be adapted dynamically, depending upon the behaviour of the initiating party, or perhaps on the statistically analyzed behaviour of a group of subscribers.
The approach described above relies upon the speech analysis procedure and silent period removal being carried out within the IMS core. Providing sufficient processing capacity to achieve this is unlikely to be problematic. However, if sufficient processing capacity is available at the terminal of the initiating party, these steps may be carried out at that terminal. That is to say that, immediately following the sounding of the appropriate tone at that terminal, the terminal analyses the user's speech to determine the length of the initial silent period. In some cases, the tone may be sounded in advance of the “talk indication” message being received at the initiating party's terminal from the IMS core.
Analysis and modification of the initial speech burst may alternatively be carried out at the receiving terminal (or receiving terminals if there are more than two participants involved in the session). However, this requires that the data transfer speed over the interface between the receiving terminal and the IMS core is significantly faster that speech speed, with the received speech being “expanded” in time before playback. If this is the case, detecting and removing an initial silent period will still provide a significant reduction in the session latency, although not as great as that achieved with the other solutions described above.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2004/050253 | 3/4/2004 | WO | 00 | 6/6/2007 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2005/096646 | 10/13/2005 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5157728 | Reed et al. | Oct 1992 | A |
20020173325 | Maggenti et al. | Nov 2002 | A1 |
20030115045 | Harris et al. | Jun 2003 | A1 |
20030223381 | Schroderus | Dec 2003 | A1 |
20040121812 | Doran et al. | Jun 2004 | A1 |
20050044256 | Saidi et al. | Feb 2005 | A1 |
Number | Date | Country |
---|---|---|
0 584 904 | Mar 1994 | EP |
Number | Date | Country | |
---|---|---|---|
20070281672 A1 | Dec 2007 | US |