RFC 1889: RTP: A Transport Protocol for Real-Time Applications
Numerous existing audio/video conferencing tools facilitate many-to-many communication, where each conference participant may choose to transmit audio or video at the participant's discretion. This model's limitations become obvious when the size of the collaborating group significantly increases. As the number of active participants increases, the ability of the group to communicate effectively decreases.
Examples of conferencing systems are found in patents and patent applications numbered: U.S. Pat. Nos. 5,608,653; 5,930,473; 6,288,739; US2005/0071427. The complete disclosures of the above applications are herein incorporated by reference for all purposes.
This invention addresses the need to maintain an effective level of communication between participants in a large group by providing a moderator who is responsible for actively managing which conference participants may transmit at his or her own discretion—any number of participants may transmit simultaneously.
The moderator may transmit audio or video at any time and is responsible for managing which of the other participants may transmit audio simultaneously. Other participants may transmit audio at their own discretion but only after requesting and then being granted permission to transmit by the moderator.
In addition to providing a moderator role, this invention provides the ability to record the conference. By having the ability to watch the conference at a later time, users who were unable to attend in real-time gain access to the information shared during the conference.
In another example of this invention the moderator role is limited and all conference participants have transmit capabilities providing a group conference with a host initially configuring the conference at the user interface. In a group conference an image is provided for each participant so all participants are visible during the conference. In a group conference any participant can record the conference.
The present invention provides an audio-only or audio/video conferencing system, which includes a user interface that displays the moderator's video, a list of invited participants and appropriate media controls (start/stop, audio gain) for each transmitting participant. Only the moderator has the ability to transmit audio-only or both audio and video at any time. All other participants may only transmit audio after requesting and being granted permission by the moderator. The moderator's user interface provides additional controls to display requests to transmit audio, respond to the requests, and revoke the ability to transmit audio. Participants have an additional control which allows them to request permission to transmit audio.
Additionally, the moderator has the ability to record streams in the conference to the local file system for later playback.
Numerous methods and protocols can be used to capture audio and video, send the streams over a network, and have the video render and audio broadcast on a client. The invention's preferred embodiment utilizes Java Media Framework (JMF) and JMF's support for device capture, encoding and decoding, rendering and Real-Time Transport Protocol (RTP).
RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video, or simulation data over multicast or unicast networks. The data transport is augmented by a control protocol, RTCP. RTCP supports the monitoring of data delivery in a manner scalable to large multicast networks, and provides minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers.
The layers responsible for transmitting and receiving RTP data (RTP connectors) as well as application-specific messages (messaging framework) may utilize a variety of network protocols in order to facilitate transmission and reception of data between conference participants, including peer-to-peer networking frameworks, a centralized server utilizing TCP sockets and/or UDP, or multicast.
Both the messaging framework and RTP connectors are designed to be independent of the underlying transport and network layers. Both the messaging framework and RTP connectors may leverage a number of mechanisms for ensuring that RTP data and application-specific messages are sent and received by all conference participants, including multicast, peer-to-peer frameworks, or one or more intermediary servers—in some configurations, no centralized server infrastructure may be needed to facilitate transmission and reception of RTP data or messages in a conference. In a preferred embodiment, transmission and reception of messages and RTP data is supported by a centralized server utilizing TCP sockets.
The present invention relies on a messaging framework that is responsible for propagating messages sent by one conference participant to all other participants in the same conference. Examples of messages sent using this framework include participant presence, requests to transmit, grants and denials.
The conferencing system of the present invention provides real-time audio and video in a moderated forum. Audio and video are the two primary components of moderated conferences. At a minimum, active participation requires the ability to capture audio. Passive participation only requires the ability to render audio. Video is optional, and the moderator is the only conference participant who may transmit video.
In a preferred embodiment, data for configuring conferences is stored in a database associated with an application server. When a client logs into the application this data is accessed by the client. In a preferred embodiment, on entry, the application establishes a connection to a messaging server as a part of the messaging framework. The messaging server monitors conference participants and communicates presence and messages to all conference participants that have joined the conference. The messaging server does not have a control function in any way nor does it maintain data other than the list of participants in a conference. State data is maintained at the client. The messaging server sends the list of participating participants to newly joining participants. Examples of state data at the client are the current conference id, whether the client is a moderator or attendee in a presentation conference, whether the client has audio permission in a presentation conference, or whether the client is transmitting video or audio only, as well as any others mentioned above. These are examples only and should not be taken as a limitation. There may be more or fewer state data variables in different embodiments.
The client states may change during a conference and the messaging server propagates messages representing these changes in state between the participants. When a participant joins a conference, the messaging server sends a notification to all clients in the conference that a new client is arriving. The messaging server also transmits permissions between clients. The host can grant permission for participants to ask questions and comment by allowing audio streams from individual participants to be transmitted.
In a preferred embodiment, all communication with the messaging server is via Jabber protocols. Jabber is an XML based protocol. Jabber was originally an instant messaging protocol and each conference instance on the messaging server is similar to an IM chat room. The chat room is dynamically created by the first participant to enter the conference. Other protocols than Jabber can be used.
Connections of the clients or nodes to other clients or nodes in some embodiments of this invention may be peer to peer without an intervening server. In this example routing of audio and video RTP packets is handled by a dedicated conferencing server using sockets set up by application software at the clients. The conferencing server acts only as a router.
In some implementations of this invention, more than one moderator may be provided in a presentation conference. Other implementations of this invention may support the moderation of video as well as audio, or may provide moderation of other forms of communication, such as text-based chat conferencing.
The current embodiment is a Java-based application which runs in an operating system 22. The Java runtime environment 24 initiates the conferencing application 30, passing in any application-specific arguments provided by the user.
One implementation may provide a web-based user interface for selecting the conference to attend and starting the conferencing application 30.
The conferencing application 30 initializes and configures JMF 26 for transmission and reception of streams.
On the transmission side, JMF 26 captures data from audio and video sources. Each device's data is encoded, packetized into RTP format and forwarded to an RTP connector that is responsible for transmitting the device's data. Locally-generated video is rendered in the user interface.
On the reception side, an RTP connector 28 receives RTP data generated by other conference participants or clients 20b and 20c possibly through configuration server 34. RTP connector 28 forwards this data to JMF 26. The data is then depacketized, decoded and rendered in the user interface.
In the current embodiment, a messaging framework 32, that resides in part on the client 20a, is responsible for sending and receiving application-defined messages to all participants in a conference. Examples of application-defined messages include presence state, moderation state, transmission requests and 5 responses.
Computer 20a is not limited to being a desktop computer and can be any device that can connect to the internet including a personal digital assistant, an enabled cell phone or a laptop. The device need only be capable of acting as a node or client in a peer to peer, server mediated or client mediated network, and receive or send audio, or audio and video.
The messaging framework 32 of the current invention ensures messages are only delivered to participants 20b, 20c who are present—messages are not cached anywhere in the system. In order to ensure the user interface of a late-arriving participant accurately represents the state of all present participants 20b, 20c, an application-defined message must be sent by the late-arriving participant and this message triggers a response from each participant 20b, 20c describing their current state.
Application server 42 and other associated hardware and software may be located anywhere. The servers may be part of an intranet inside a firewall for use exclusively by a business or it may be servers available for conferencing use at the application provider's hub. Access could also be a service supplied by an internet service provider. For purposes of this example, conferencing servers 34 are servers the application provider offers for use from their site.
Client A 20a enters into the application through the portal web page in the user interface and enters their username and password to access application server 42. Application server 42 verifies the username and password. Application server 42 queries database 44 that stores the conference configuration for user data.
On successful retrieval of the conference configuration, the application initializes the JMF framework 82, which attempts to configure a network 86 and acquire any media capture devices required for the conference. If not able to initiate JMF, the application exits 80, or if not able to configure a network the application exits 84. If any required media capture devices are unavailable (for example, the moderator must be able to transmit audio), the user is notified of the error and is prevented from entering the conference.
Once media capture device acquisition has been completed successfully, the application attempts to configure any required network resources, for example joining multicast groups or making socket connections to video conferencing servers 104 required by RTP connectors or the messaging server 102. Otherwise, the user is notified of the error and exits 76.
On successful configuration of network resources, a ‘present’ message is sent via the messaging framework to conference participants 88. In response to this ‘present’ message, the other attending conference participants send information about their current state (presence, moderation state and transmission state) 90 from messaging server 102. Only when the ‘present’ message has been received from the moderator can a participant participate in the conference 92 and request the ability to transmit.
Once all information about existing conference participants is received, any audio or video streams that are being received are rendered. When the participant chooses to exit the conference 94, a ‘not present’ message is sent to all conference participants and their user interfaces are updated 96.
At conference entry, Java 24 creates four TCP sockets between the client and video conferencing server 104, one for video RTP communications, one for video RTCP communication, one for audio RTP communications and one for audio RTCP communications. The RTCP sockets are for control information associated with transmission and reception of RTP packets. This includes counting lost packets, measuring jitter, and other housekeeping duties defined by the RTP protocol. For clarity, only one connection is shown for each client in
Once the participant enters into the present state 588, choices become available when the moderator is also present. The participant can be a passive participant and remain in the present state 588, listening to the audio transmission and/or watching the video transmission, or the participant can request to transmit audio, which transitions the participant to the requesting state 590.
The moderator has the ability to grant or deny the participant's request to transmit audio 590 to the conference. If the moderator denies 592 the participants request to participate by transmitting audio 590, then the participant's state returns to the present 588 state. From the present 588 state, the participant can choose to continue as a passive participant, or can again request to transmit audio 590.
If the moderator grants the participant's request to transmit audio, then the state changes from requesting 590 to granted 594. At that point the participant has the ability to transmit audio, and may begin transmitting 598 to the conference participants at the participant's discretion.
At any time after granting the participant the ability to transmit, the moderator can revoke 596 the previous grant, transitioning the participant to the present 588 state, revoking the participant's ability to transmit. From the present 588 state, the formerly active participant can either become a passive participant or can again request to transmit audio 590.
If the participant who is transmitting audio 598 does not have the audio permission revoked 596, then the transmitting participant can start and stop 599 transmission at his or her own discretion. Stopping transmission results in the participant transitioning to the granted state 594.
The viewing pane 610 is where the moderator's video is displayed during a conference. For participants, the rendering of the moderator's video can be toggled off and on by clicking a button 612 below the viewing pane 610. Below this button the conference moderator's name 614 is displayed. The volume of the moderator's audio transmission can be adjusted by moving the audio slider 616, also directly below the viewing pane 610. If the conference only provides audio, the same user interface is displayed, but no viewing pane is provided.
When a participant wants to transmit audio in the conference and the moderator is present, the request permission to speak button 618 is clicked. The name of the participant requesting permission to speak is added to a list display 620 in the moderator's user interface. If the moderator wants to allow the requester to transmit audio, then the moderator selects the requestor's name in the list and clicks the grant permission to speak button 622 next to the requestor's name. If the moderator does not want the requestor to speak, then the moderator selects the requestor's name in the list and clicks the deny permission to speak button 624, next to the requestor's name. When a participant requests permission to transmit audio, the request also appears in the status bar 626 at the bottom of all participant user interfaces.
Once a participant is granted permission to transmit audio 622, the participant must click a button to begin transmission of audio 628. This same button also toggles the transmission of audio off 628 when the participant no longer needs to transmit audio.
While a participant is transmitting audio 642, a volume icon 630 is displayed to the right of the transmitting participant's name in the user interface of each participant receiving the audio transmission. The volume icon 630 can be used by participants to adjust the transmitting participant's volume. When a participant has been granted permission to speak, but is not transmitting, this icon 630 appears in an inactive state 632 next to the participant's name in the user interface of all participants receiving the audio transmission.
If a moderator wants to stop a participant from transmitting audio, the moderator can click the revoke permission to speak button 634 next to the transmitting participant's name on the moderator's interface.
The lower half of the interface displays who is invited to a conference. If an invitee has an icon 636 before their name, it indicates that the invitee is present in the conference. When there is no icon 638 in front of an invitee's name, it indicates that the invitee has not yet entered the conference.
When a conference is being recorded, a red light displays in the status bar 640 to show all participants and the moderator that the moderator is recording the conference. When the recording has been stopped, the red light 640 is no longer displayed. The moderator may not start or stop transmission until after recording is stopped. Other embodiments may provide the ability for any conference participant to record the conference.
It is believed that the disclosure set forth above may encompass a distinct invention with independent utility. While this invention has been disclosed in its preferred form, the specific embodiments thereof as disclosed and illustrated herein are not to be considered in a limiting sense as numerous variations are possible. The subject matter described includes all novel and non-obvious combinations and sub-combinations of the various elements, features, functions and/or properties disclosed herein.
Inventions embodied in various combinations and sub-combinations of features, functions, elements and/or properties may be claimed in a related application. Such claims, whether they are directed to a different invention or directed to a same invention, whether different, broader, narrower or equal in scope to any original claims, are also regarded as included within the subject matter of the present disclosure.
This application claims priority to U.S. Provisional Patent Application Ser. No. 60/676,089 file Apr. 28, 2005 and entitled “Collaborative Conferencing System.” The complete disclosure of the above application is herein incorporated by reference for all purposes.
| Number | Date | Country | |
|---|---|---|---|
| 60676089 | Apr 2005 | US |