This application relates to playback of recorded media in a push-to-talk communication environment.
In a push-to-talk communication environment, a plurality of users or speakers joins a common channel, for example a VTG (Virtual Talk Group) to communicate with one another. Typically, the communication channel is configured such that only one speaker is allowed to speak at a time. Thus, speech which is audible in such a channel generally comprises a plurality of media segments (e.g. portions of speech) from respective speakers which media segments are appended serially one media segment after another. The communication in such a push-to-talk environment is therefore generally ordered and is suitable for safety and security operations.
Speech of safety and security operations is usually recoded in order to facilitate forensic analysis of events. The same recording can be used by latecomers who join the operation or session (e.g. log onto the VTG) after it has started, in order to inform or notify the latecomers about what has previously transpired. Operations are usually managed by one or more “principals”. This individual is generally the highest ranking person present, or a specialist who is recognized for his understanding or authority; usually what he says carries the key actions or content. As a new user joins an operation, he or she typically wants to understand what had previously transpired in the event.
The user can invoke the replay mechanism and listen to the replay of all that had been said prior to his joining. If the new user is pressed for time, he may choose to listen only to the media segments (e.g. voice clips or speech portions) of the principals. This, however, has the disadvantage that he could miss a comment or question from one of the other speakers. The user may speed up the whole replay, but this may detract from his ability to focus on the principal's messages. Yet another option is to modify the replay speed continually, for instance slowing down the voice of the principal and speeding up the reply of the spoken statements of the other speakers. This may shorten the time required to listen to the recorded message but may not be practical when the new user needs to cater to unfolding events.
Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
a shows a schematic representation of an example embodiment of the system of
b shows a schematic representation of an example embodiment of the system of
a shows, in high-level flow diagram form, an example of a method, in accordance with an example embodiment, for controlling playback of recorded media in a push-to-talk communication environment;
b and 5c show, in low-level flow diagram form, examples of a method, in accordance with an example embodiment, for controlling playback of recorded media in a push-to-talk communication environment; and
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of example embodiments.
Overview
In one embodiment a method is provided which comprises recording a push-to-talk communication session comprising media segments, each media segment being associated with an endpoint device from which the media segment originated. A playback request for playback of at least one media segment at an adjusted playback speed may be received and, in response to the playback request, a playback speed of the at least one media segment may be adjusted relative to another media segment. The recorded media segments including the media segment with the adjusted playback speed may then be provided at a requesting endpoint device.
Example Embodiments
The system 100 may include a telecommunications network 102 which may include the Internet or may be in the form of a dedicated push-to-talk communication network. It is to be appreciated that the telecommunications network 102 may be configured for handling any one or more push-to-talk compatible communication protocols such as unicast, multicast and the like.
The system 100 may further include a plurality of multimedia endpoint devices (e.g. endpoint devices). The term “multimedia endpoint device” includes any device having push-to-talk capabilities, e.g. a telephone, a land mobile radio (LMR), a PDA, a computer with a soft push-to-talk application, and the like. The endpoint devices are shown by way of example to be in the form a mobile telephone 110, an IP (Internet Protocol) telephone 112, for example a VoIP (Voice over IP) telephone, and a computer with a soft push-to-talk application 114. The endpoint devices 110 to 114 may be operable to communicate with one another via a common channel, for example in a VTG. The endpoint devices 110 to 114 may be operable to transmit speech or any other media from speakers (e.g. users of the respective endpoint devices 110 to 114) in a VTG to be listened to or played back by other users of the VTG. It is to be appreciated that three example endpoint devices 110 to 114 are shown for ease of illustration only, and the system 100 may include any number of endpoint devices. Further, in example embodiments, the endpoint devices may also communicate data other than voice data.
The system 100 may further include a computer server 120 which may be configured for hosting or otherwise accommodating push-to-talk communication. The computer server 120 may thus be in the form of an IPICS server (IP Interoperability and Collaboration System) available from Cisco Systems Inc. For example, the computer server 120 may be operable to host one or more VTGs which are accessible by the endpoint devices 110 to 114 for push-to-talk communication with one another. It is to be borne in mind that although this example embodiment is described by way of example with reference to an IPICS server, it is applicable in any push-to-talk communication servers or systems.
Referring now to
The computer system 100 may thus include a memory module 206, for example a hard disk drive or the like, on which the media (represented schematically by reference numeral 208) e.g. speech or other media received from the endpoint devices 110 to 114 is recorded or recordable for later playback. The media 208 which is recorded on the memory module 206 may be in the form of a single continuous audio clip or stream comprising individual media segments from the various speakers, the media segments being sequentially appended or added one after another to form the single audio clip or recording. The association module 202 may be operable to append or annotate data indicative of the speaker or originator (e.g. an identifier of the endpoint device 110 to 114 from which the speech originated) of each media segment to the recorded audio clip 208, thereby associating the media segments with the respective speakers.
The computer system 100 further includes an adjustment module 204 which is operable to adjust playback speed of the media 208, specifically media segments 208, in accordance with priority criteria assigned to the speaker associated with that media segment. Differently stated, the adjustment module 204 may be operable to determine from which speaker or endpoint device 110 to 114 a media segment 208 originated and automatically adjust the playback speed of each media segment 208 in accordance with priority criteria assigned to the respective speakers.
It is to be understood that the computer system 100 in accordance with an example embodiment may be embodied wholly by the computer server 120, partially by the computer server 120 and partially by one or more endpoint devices 110 to 114, or wholly by one or more of the endpoint devices 110 to 114. Thus, the functional modules 202 and 204 may be distributed among remote devices or systems.
a shows a system 250 of example detail of the system 100 shown in
The computer server 120 may additionally include a calculation module 254 which is operable to calculate or estimate a playing time for the media 208 at a combination of various playing speeds. The calculation module 254 may be operable to calculate a normal playing time (e.g., playback at the same speed that the media was originally played), for example, a playing time of the entire media 208 played at normal (1×) speed. The calculation module 254 may further be operable to calculate a playing time for the media 208 if the entire media 208 is played back at an accelerated speed, for example double (2×) or quad (4×) speed (or any other speed). Further, in accordance with an example embodiment, the calculation module 254 may be operable to calculate a playing time of the media 208 when component segments of the media 208 are played back at various speeds. For instance, the calculation module 254 may be operable to calculate or estimate a playing time of the media 208 if the media segments of a first person (or the speech originating from a first endpoint device) is played back at normal speed, the media segments of the second person is played back at double speed while the media segments of a third person is played back at quad speed. Thus, broadly, in an example embodiment, in response to a playback request, a playback speed of the at least one media segment may be adjusted relative to another media segment.
The computer server 120 may also comprise a communication interface 256, for example in the form of a network communication device (a network card, a wireless access point, or the like). The communication interface 256 may be operable both to receive incoming communications (therefore acting as a receiving arrangement) and to transmit outgoing communications (therefore acting as a transmission or sending arrangement). The communication interface 256 may be operable to connect the computer server 120 to the telecommunications network 102.
In an example embodiment, the computer server 120 may include a priority or priority criteria stored on the memory module 206, the priority criteria being schematically represented by reference 258. The priority criteria 258 may include an identifier of a user or speaker, or alternatively may include an identifier of an endpoint device 110 to 114 (e.g., when the endpoint device is a priority endpoint device). Further, the priority criteria 258 may include a priority or rank associated with each speaker, for example a high priority, a normal priority, a low and a very low priority. In an example embodiment, the priority may be associated with the role or position of the speaker, rather than the speaker himself. Thus, a highway officer may have the highest priority regardless of the identity of the officer. Instead, or in addition, the priority criteria 258 may include a playback speed associated with each speaker or with each role, for example normal (1×) if the speaker is important, fast (1.5×) if the speaker is average, faster (2×) if the speaker is unimportant, and if the speaker is totally irrelevant, his speech portions may be skipped altogether (analogous to an infinite playback speed).
In an example embodiment, the priority criteria 258 may be pre-assigned by a supervisor or network administrator based on importance of the speakers. For example, if one speaker is the CEO of the company, he may be assigned a high priority, a project manager may be assigned a normal priority, while other employees may be assigned a low or very low priority. In one embodiment, the relative importance of the speakers may be stored in a directory (e.g. on memory module 206) and retrieved by the calculation module 254 in real time.
The endpoint devices 110 to 114 are shown by way of example to be part of a VTG schematically indicated by reference numeral 260. The endpoint devices 110 to 114 are thus able to communicate with one another in the VTG 260 in a push-to-talk communication environment.
In an example embodiment, the endpoint devices 110 to 114 may communicate with one another using RTP (Real-time Transport Protocol) which is appropriate for delivering audio and/or video data (or any other low latency data) across a network. The telecommunications network 102 may thus be an RTP compatible network. In such a case, endpoint devices 110 to 114 may also communicate utilizing RTCP (Real-time Transport Control Protocol) which contains control information about the data (e.g. audio) transmitted via RTP. Thus, by examining RTCP packets, e.g. the packet headers, which relate to the push-to-talk communication between endpoint devices 110 to 114, it may be possible to determine from which endpoint device 110 to 114 a particular a media segment originated. Therefore, the association module 202 may be operable to examine or interrogate the RTCP packets thereby to determine a source of each media segment and thereafter to annotate or mark the media segments contained within the media 208 with data indicative of the endpoint device 110 to 114 or the speaker from which the media segment originated.
In an example embodiment, the computer server 120 as mentioned above may be an IPICS server. In such an example case, the IPICS server may include a floor control mechanism which is operable to arbitrate the various push-to-talk speakers. Stated differently, the floor control mechanism may be operable to determine when a speaker may and may not speak. For example, if endpoint device 110 is transmitting media from its speaker, the floor control mechanism will not allow the other endpoint devices 112 and 114 to transmit audio, thus ensuring that there is at most one incoming audio stream. The association module 202 may be operable to determine from the floor control mechanism the source of the media (e.g. incoming audio or speech) in order to associate, in similar fashion to examining RTCP packets, each media segment of the recorded media 208 with an endpoint device 110 to 114 or a speaker from which the media segment originated.
In an example embodiment, a latecomer (e.g., a person joining a VTG after communications have already commenced), or any other person wishing to hear the recorded media 208, may opt to receive a transmission of the media 208. The computer server 120 may therefore include an IVR (Interactive Voice Response) system to provide a user interface on one or more endpoint devices 110 to 114. This user interface may be operable to transmit information about the media 208 and to receive an input, for example a keystroke (e.g., DTMF audio), from the endpoint device 110 to 114. For example, if the user of endpoint device 110 joins the VTG 260 late, he may wish to hear the media 208 to bring him up to date with the conversation or operation. The calculation module 254 may calculate playback times for the media 208, including a playback time for the media 208 played at normal speed and a playback time for the recorded media 208 played at adjusted speeds in accordance with the priority criteria 258 of the speakers from which the various media segment originated. These playback times may be communicated to the endpoint device 110 via the communication interface 256, for example using an appropriate user interface e.g., voice prompts, text message, screen popup etc. The communication interface 256 may then be operable to receive a communication indicative of a keystroke from the endpoint device 110 to indicate the selection of one of the playback options. In an example embodiment, speakers or users may be able to assign priority criteria 258 to the other speakers from their endpoint devices 110 to 114 (described further by way of example below).
Referring now to
While the user of endpoint device 112 is speaking and listening to VTG A 272, it may be inconvenient or impossible for him to pay attention to the conversation occurring in VTG B 274. Thus, in accordance with an example embodiment, the endpoint device 112 records the speech of VTG B 274, for example between endpoint devices 114 and 115. When the user of endpoint device 112 is able to direct his attention away from VTG A 272 towards VTG B 274, he may need to catch up on the conversation which he missed.
In accordance with an example embodiment, the endpoint device 112 (or any other endpoint device) may include a user interface, for example a TUI (Telephony User Interface) or a GUI (Graphical User Interface). Referring now also to
The endpoint device 300 may include a display screen 301 and a plurality of user selectable buttons 302, 304 (e.g. soft keys) on either side of the display screen 301. For example, the buttons 302 on the left-hand side of the display screen may be respectively associated, in use, with other endpoint devices 306 forming part of a VTG, while the buttons 304 on the right-hand side may be associated with a priority or playback speed 308. By first selecting a device 306 and then assigning a priority 308 to the device 306, a user of the endpoint device 300 may select and assign priorities to users or speakers in accordance with his preferences. The user interface thus acts as a receiving arrangement which is operable to receive a user input indicative of priority criteria to be assigned to other speakers. Instead, a user of the endpoint device 300 may use a conventional keypad 312 to input his selection of priority criteria in response to, for example, voice prompts.
Thus, when the user of endpoint device 112 directs his attention towards VTG B 274, he may choose to assign various priority criteria to the other endpoint devices 114, 115 forming part of VTG B 274, so that the user, when hearing playback of the recorded media 208, may decrease the total playback time by fast forwarding through less important users. It should be understood that other user interfaces may be provided. For example, user of a soft client on a PC may employ richer text, web, pop-up, etc. interfaces to achieve the functions described above.
Example embodiments will now be further described in use with reference to
b shows a low-level flow diagram of a method 330, in accordance with the example embodiment, for controlling playback of recorded media in a push-to-talk communication environment. For ease of description, the method 330 will be further described with reference to the system 250 of
For example, users of two endpoint devices 110 and 112 may join a common VTG 260, via a push-to-talk compatible telecommunications network 102, thereby to communicate with each other in a push-to-talk environment. The VTG 260 may be hosted or presented by computer server 120. By way of example, the VTG 260 may be a safety and security operations channel, for example a channel of a police department. The users of the endpoint devices 110 and 112 therefore may be communicating with each other about police related business or incidents.
The computer server 120 may then receive, at block 332, successive media segments from the endpoint devices 110 and 112, one at a time. The computer server 120 may receive the media in the form of IP packets via communication interface 256 which thus acts as a receiving arrangement.
The association module 202 may be operable to determine, at block 334, a source from which each media segment originated. If the telecommunications network 102 is employing RTCP, the association module 202 may be operable to interrogate an RTCP packet thereby to determine an identifier indicative of the endpoint device 110 and 112 from which the media, audio or data, as contained in RTCP packets, originated. Instead, or in addition, if the computer server 120 is an IPICS server, it may employ a floor control mechanism which is operable to identify the source of incoming media segments.
Once the source endpoint device of an incoming media segment has been identified, the source endpoint device (e.g. endpoint device 110) is associated, at block 336, with that media segment. This association may be done by annotating or tagging the media segment with data indicative of the source of that media segment, or by keeping a log (e.g. in the form of Metadata) of incoming media. The successive media segments are then appended sequentially one after another and recorded, at block 338, on the memory module 206 for later playback. In accordance with one embodiment, the computer server 120 may record and store the associated metadata along with the recorded media 208.
By way of example, user of the endpoint device 114 may join the VTG 260 after an initial two users have already exchanged correspondence. He is therefore a latecomer, and may wish to be updated on the progress of the police operation. In response to the latecomer joining the VTG 260, the calculation module 254 calculates, at block 340, playback times of the recorded media 208 based on various playback speeds.
In this example embodiment, the priority criteria 258 are predefined by a system administrator. However, the priority criteria 258 could be assigned by a user (see further below). For example, the user of endpoint device 110 could be the chief of police, and would thus be the principal of the VTG 260. He may be assigned a high priority (1×) and playback of his segments of media or speech may thus be played back at normal speed. The user of endpoint device 112 may be a regular policeman, thus being assigned an average priority (1.5×) or a low priority (2×) and segments of his speech may be played back at increased speed. For illustrative purposes, the segments of speech from the chief of police (from endpoint device 110) may have a total duration of one minute, while the segments of speech from the regular policeman (from endpoint device 112) may have a total duration of two minutes. In such a case, the calculation module 254 may calculate that the total playback time for the recorded media 208 played at normal speed in its entirety would be three minutes (one minute+two minutes). The calculation module 254 may then further calculate that the total playback time for the recorded media 208 played back at a speed adjusted in accordance with the priority criteria 258 would be two minutes−one minute for the chief of police and one minute (two minutes played back at increased (e.g. double) speed) for the regular policeman.
The latecomer may then be presented, for example via prompts from a user interface, with a number of playback options to play back the recorded media 208. A first option may be to play the entire recorded media 208 at normal speed, while a second option may be to play the recorded media 208 at speeds adjusted in accordance with the priority criteria 258. The latecomer may input his response, for example via the keypad 312 of his endpoint device 114, to select one of the presented options.
The computer server 120 receives, at block 344, the selected option, for example via a PC based graphical user interface, and the adjustment module 204 adjusts the playback speed of the recorded media 208 accordingly. If the option to playback the recorded media 208 adjusted in accordance with the priority criteria 258 was selected (for a total playback duration of two minutes), the adjustment module 204 may be operable to determine which media segments are associated with each endpoint device 110 and 112 by interrogating the annotated or tagged data and thereafter to adjust, at block 346, the playback speed of those media segments accordingly. The recorded media 208 having adjusted playback speeds is then transmitted, at block 348, to the endpoint device 114 of the latecomer, so that the latecomer can be updated and then contribute to the conversation.
Referring now to
Operations 362 to 368 of method 360 are similar to operations 332 to 338 of method 330, however, in accordance with an example embodiment, the operations 362 to 368 of method 360 are performed by the endpoint device 112. Although not illustrated, some operations could be done by the computer server 120, while other operations could be done by one or more of the endpoint devices.
This example embodiment may find application when the user of endpoint device 112 is simultaneously logged onto two or more independent VTGs. For example, the user could be a dispatcher who needs to listen to multiple channels simultaneously to co-ordinate rescue efforts. Thus, VTG A 272 could be a police services channel, while VTG B 274 could be a fire services channel. While the dispatcher is listening to the conversation of VTG A 272 his attention is diverted away from VTG B 274. However, in accordance with an example embodiment, the speech of both VTGs is being recorded by the endpoint device 112. It will thus be understood that the media of each VTGs may be separately recorded and stored on the memory module 206.
When the dispatcher directs his attention to VTG B 274, he needs to know what had transpired when his attention was elsewhere. He thus invokes a user interface similar to that of
Operations 374 to 380 of method 360 are similar to corresponding operations 340 to 348 of method 330, except that they are performed by the endpoint device 112.
The example computer system 400 includes a processor 402 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 404 and a static memory 406, which communicate with each other via a bus 408. The computer system 400 may further include a video display unit 410 (e.g., a liquid crystal display (LCD), plasma display, or a cathode ray tube (CRT)). The computer system 400 also includes an alphanumeric input device 412 (e.g., a keyboard), a user interface (UI) navigation device 414 (e.g., a mouse), a disk drive unit 416, a signal generation device 418 (e.g., a speaker) and a network interface device 420.
The disk drive unit 416 includes a machine-readable medium 422 on which is stored one or more sets of instructions and data structures (e.g., software 424) embodying or utilized by any one or more of the methodologies or functions described herein. The software 424 may also reside, completely or at least partially, within the main memory 404 and/or within the processor 402 during execution thereof by the computer system 400, the main memory 404 and the processor 402 also constituting machine-readable media.
The software 424 may further be transmitted or received over a network 426 via the network interface device 420 utilizing any one of a number of well-known transfer protocols (e.g., HTTP, FTP).
While the machine-readable medium 422 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals.
The Abstract of the Disclosure is provided to comply with 37 C.F.R. §1.72(b), requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
The example embodiments may present a time efficient way of listening to recorded media in a push-to-talk communication environment. Playback speed of the various media segments may automatically be adjusted in accordance with priority criteria. Further, the priority criteria may be chosen depending on particular operational requirements of users. Also, expected playback times may be calculated and reported to users, so that they know how long it will take to listen to the playback of the recorded media at various playback speeds.
Number | Name | Date | Kind |
---|---|---|---|
7639634 | Shaffer et al. | Dec 2009 | B2 |
20050215273 | Ito | Sep 2005 | A1 |
20060040695 | Yoon et al. | Feb 2006 | A1 |
20070155415 | Sheehy et al. | Jul 2007 | A1 |
Number | Date | Country |
---|---|---|
1761083 | Mar 2007 | EP |
Number | Date | Country | |
---|---|---|---|
20080114600 A1 | May 2008 | US |