1. Field of the Invention
The present invention generally relates to videoconferencing and, more particularly, a method and apparatus for setting up a videoconference call.
Conventional videoconferencing systems employ a Multi Conferencing Unit (MCU), which is a network element that receives media streams from each videoconferencing participant and combines the streams together to send to the destination clients in a videoconference session. The MCU is costly and its use results in scalability limitations in conventional videoconferencing systems.
Accordingly, it would be desirable and highly advantageous to have a method and apparatus for setting up a videoconference call that does not require the use of an MCU and that does not impose the scalability problems corresponding thereto.
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method and apparatus for setting up a videoconference call.
The present invention sets up a videoconference call using a server. In the event that the videoconference call is between an initiating client (client who initially requests the videoconference) and multiple destination clients (clients with whom the initiating client desires to engage in the videoconference), the present invention initiates an IP multicast session between the initiating client and the multiple destination clients. In the event that the videoconference call is between the initiating client and only one destination client, then an IP unicast session is initiated.
According to an aspect of the present invention, there is provided a method for setting up a multicast session between clients. The method includes the steps of assigning a common multicast IP address so that the clients may participate in the videoconference session, and transmitting the common multicast IP address to at least one of the clients.
According to another aspect of the present invention, there is provided a system for setting up a multicast session between clients. The system includes means for assigning a common multicast IP address so that the clients may participate in the session, and means, in data communication with the assigning means, for transmitting the common IP address to at least one of the clients.
According to yet another aspect of the present invention, there is provided a method for setting up a multicast session between clients. The method includes the steps of providing an ability to receive a common multicast IP address for the videoconference session, and providing an ability to receive videoconference content using the common multicast IP address.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.
The present invention is directed to an apparatus and method for setting up a videoconference call. The present invention flexibly allows for the set up of a videoconference session from an initiating client (initiating client is the client who initially requests the videoconference) to a single destination client or to multiple destination clients (destination client(s) is the client(s) with whom the initiating client desires to engage in the videoconference). The call set up is controlled through a centralized videoconference server and is not done on a direct peer-to-peer basis.
When video is to be sent to multiple destination clients during a videoconference session, the centralized videoconference server assigns a multicast group IP address to the initiating client in order to send the video content to the destination clients. The destination clients are then requested to join the multicast group from a request sent by the centralized videoconference server.
It is to be appreciated that the following terms are used interchangeably herein: “client”; “client computer”; and “videoconference client application”.
It is to be understood that the present invention may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Preferably, the present invention is implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof) which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying Figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the present invention is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present invention.
A display device 116 is operatively coupled to system bus 104 by display adapter 110. A disk storage device (e.g., a magnetic or optical disk storage device) 118 is operatively coupled to system bus 104 by I/O adapter 112.
A mouse 120 and keyboard 122 are operatively coupled to system bus 104 by user interface adapter 114. The mouse 120 and keyboard 122 are used to input and output information to and from system 100.
At least one speaker (herein after “speaker”) 197 is operatively coupled to system bus 104 by sound adapter 199.
A (digital and/or analog) modem 196 is operatively coupled to system bus 104 by network adapter 198.
A description will now be given of policy based network management (PBNM), according to an illustrative embodiment of the present invention. PBNM is a technology that provides the ability to define and distribute policies to manage networks (an example network to which the present invention may be applied is described below with respect to
In further detail, PBNM defines policies for applications and users that consume network resources. For example, business critical applications can be given the highest priority and a percentage of the bandwidth on the network, videoconferencing and voice over IP can be given the next highest priority, and finally web traffic and file transfers that do not have strict bandwidth or time critical constraints can be given the remaining amount of resources on the network. This differentiation of users and applications can be accomplished using PBNM.
The videoconference system ties into a PBNM system by querying a network policy server for the policy that corresponds to the videoconference application. The videoconference server obtains the policy from the network policy server and determines the resources available in the network for videoconferencing based on the received parameters. The policy will typically correspond to, for example, the bandwidth available to this application during certain times of the day or only to certain users. The configuration is readily modified by, for example, adding, deleting, replacing, modifying, etc., policies and/or portions thereof. As a result, the videoconference server will use the information provided in the policy to manage conferencing sessions on the network.
A description will now be given of a server architecture, according to an illustrative embodiment of the present invention.
The session management entity 306 is responsible for managing videoconference session setup and teardown. The session management entity 306 also provides most of the main control for the videoconference server 205. The session management entity 306 includes a session manager 320 for implementing functions of the session management entity 306.
The network communications entity 304 is responsible for encapsulating the many different protocols used for the videoconference system. The protocols include Simple Network Management Protocol (SNMP) for remote administration and management, Common Open Policy Services (COPS) or another protocol such as Lightweight Directory Access Protocol (LDAP) for policy management, Multicast Address Dynamic Client Allocation Protocol (MADCAP) for multicast address allocation, Session Initiation Protocol (SIP) for videoconference session management, and Server to Server messaging for distributed videoconferencing server management. Accordingly, the network communications entity 304 includes: an SNMP module 304a; an LDAP client module 304b; a MADCAP client module 304c; a SIP module 304d; and a server-to-server management module 304e Moreover, the preceding elements 304a-e respectively communicate with the following elements: a remote administration terminal 382; a network policy server (bandwidth broker) 384; a MADCAP server 215; desktop conferencing clients 388; and other videoconferencing servers 390. Such communications may be implemented also using Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Protocol (IP), collectively represented by protocol module 330. It is to be appreciated that the preceding list of protocols and corresponding elements are merely illustrative and, thus, other protocols and corresponding elements may be readily employed while maintaining the spirit and scope of the present invention.
It is to be further appreciated that the architecture of the videoconference server 205 is also suitable for a user on a portable device to connect into the corporate infrastructure through a Virtual Private Network (VPN) in order to send and receive content from a videoconference session.
The database entity 302 includes the following four databases: a scheduling database 310, an active session database 312, a member database 314, and a network architecture database 316.
The videoconference system server 205 further includes or, at the least, interfaces with, a company LDAP server (user information) 340 and an optional external database 342. The optional external database 342 includes an LDAP client 304b.
A description will now be given of the member database 314 included in the database entity 302 of
A description will now be given of the active session database 312 included in the database entity 302 of
Referring again to
Policy information concerning the number of videoconference sessions that are allowed to take place simultaneously, the videoconference session bit rates, and bandwidth limits can also be defined in the network architecture database 316. The network architecture could be represented as a weighted graph within the network architecture database 316. It is to be appreciated that the network architecture database 316 is an optional database in the videoconference server 205. The network architecture database 316 may be used to cache the policies that are requested from the policy server 210.
A description will now be given of the scheduling database 310 included in the database entity 302 of
A description will now be given of the network communications entity 304 of
A description will now be given of the Simple Network Management Protocol (SNMP) module 304a included in the network communication entity 304 of
The Simple Network Management Protocol (SNMP) client-server architecture 600 includes an SNMP management station 610 and an SNMP managed entity 620. The SNMP management station 610 includes a management application 610a and an SNMP manager 610b. The SNMP managed entity 620 includes managed resources 620a, SNMP managed objects 620b, and an SNMP agent 620c. Moreover, each of the SNMP management station 610 and an SNMP managed entity 620 further include a UDP layer 630, an IP layer 640, a Medium Access Control (MAC) layer 650, and a physical layer 660.
The SNMP agent 620c allows monitoring and administration from the SNMP management station 610. The SNMP agent 620c is the client in the SNMP architecture 600. The SNMP agent 620c basically takes the role of responding to requests for information and actions from the SNMP management station 610. The SNMP management station 610 is the server in the SNMP architecture 600. The SNMP management station 610 is the central entity that manages the agents in a network. The SNMP management station 610 serves the function of allowing an administrator to gather statistics from the SNMP agent 620c and change configuration parameters of the SNMP agent 620c.
Using the SNMP model, the resources in the videoconference server 205 can be managed by representing these resources as objects. Each object is a data variable that represents one aspect of the managed agent. This collection of objects is commonly referred to as a Management Information Base (MIB). The MIB functions as a collection of access points at the SNMP agent 620c for the SNMP management station 610. The SNMP management station 610 is able to perform monitoring by retrieving the value of MIB objects in the SNMP agent 620c. The SNMP management station 610 is also able to cause an action to take place at the SNMP agent 620c or can change the configuration settings at the SNMP agent 620c.
SNMP operates over the IP layer 640 and uses the UDP layer 630 for its transport protocol.
The basic messages used in the SNMP management protocol are as follows: GET; SET; and TRAP. The GET message enables the SNMP management station 610 to retrieve the value of objects at the SNMP agent 620c. The SET message enables the SNMP management station 610 to set the value of objects at the SNMP agent 620c. The TRAP message enables the SNMP agent 620c to notify the SNMP management station 610 of a significant event.
A description will now be given of the SNMP managed resources 620a included in the SNMP managed entity 620, according to an illustrative embodiment of the present invention. The remote administration could monitor and/or control the following resources within the videoconference server 205: active sessions and associated statistics; session log; network policy for videoconferencing; Session Initiation Protocol (SIP) parameters and statistics; and MADCAP parameters and statistics.
From the SNMP management station 610, the following three types of SNMP messages are issued on behalf of a management application: GetRequest; GetNextRequest; and SetRequest. The first two are variations of the GET function. All three messages are acknowledged by the SNMP agent 620c in the form of a GetResponse message, which is passed up to the management application 610a. The SNMP agent 620c may also issue a trap message in response to an event that has occurred in a managed resource.
Referring again to
A description will now be given of the Multicast Address Dynamic Client Allocation Protocol (MADCAP) client module 304c included in the network communications entity of
A description will now be given of the Session Initiation Protocol (SIP) module 304d included in the network communications entity 304 of
In a SIP based videoconference system, each client and server is identified by a SIP URL. The SIP URL takes the form of user@host, which is in the same format as an email address, and in most cases the SIP URL is the user's email address.
A description will now be given of the server-to-server management module 304e included in the network communications entity 304 of
The following messages are defined: QUERY—query an entry in a remote server; ADD—add an entry to a remote server; DELETE—delete an entry from a remote server; and UPDATE—update an entry on a remote server.
The server-to-server messaging can use a TCP based connection between each server. When the status of one server changes, the remaining servers are updated with the same information.
A description will now be given of operational scenarios of the videoconference server 205, according to an illustrative embodiment of the present invention. Initially, a description of operational scenarios corresponding to the setting up of a videoconference session is provided, followed by a description of operation scenarios corresponding to resolution and frame rate adjustment during the videoconference session. Session operational scenarios include SIP server discovery, member registration, session setup, session cancel, and session terminate.
A description will now be given of a session operational scenario corresponding to SIP server discovery, according to an illustrative embodiment of the present invention. A user (videoconference client application) can register with a preconfigured videoconference server (manually provisioned) or on startup by sending a REGISTER request to the well-known “all SIP servers” multicast address “sip.mcast.net” (224.0.1.75). The second mechanism (REGISTER request) is preferable because it would not require each user to manually configure the address of the local SIP server in their videoconference client application. In this case, the multicast addresses would need to be scoped correctly in the network to ensure that the user is registering to the correct SIP server for the videoconference. In addition to the previous methods, in another method to make the provisioning process simpler, the SIP specification recommends that administrators name their SIP servers using the sip.domainname convention (for example, sip.princeton.tce.com).
A description will now be given of a session operational scenario corresponding to member registration, according to an illustrative embodiment of the present invention.
In the member registration function, the client 702 sends a SIP REGISTER request to the server 205 (step 710). The server 205 receives this message and stores the IP address and the SIP URL of the client 702 in the member database 314.
The REGISTER request may contain a message body, although its use is not defined in the standard. The message body can contain additional information relating to configuration options of the client 702 that is registering with the server 205.
The server 205 acknowledges the registration by sending a 200 OK message back to the client 702 (step 720).
Descriptions will now be given of unicast and multicast videoconference sessions, according to illustrative embodiments of the present invention.
In the unicast example, a unique stream is sent from each client to each other client. Such an approach can consume a large amount of bandwidth as more participants join the network. In contrast, in the multicast approach, only one stream is sent from each client. Thus, the multicast approach consumes less of the network resources such as bandwidth in comparison to the unicast approach.
A description will now be given of a session operational scenario corresponding to a unicast videoconference session set up, according to an illustrative embodiment of the present invention.
An INVITE request is sent from the client #1802 to the server 205 (step 810). The INVITE request is forwarded from the server 205 to the client #2806 (step 815).
A 180 ringing message is sent from the client #2706 to the server 205 (step 820). The 180 ringing message is forwarded from the server 205 to the client #1702 (step 825).
A 200 OK message is sent from the client #2706 to the server 205 (step 830). The 200 OK message is forwarded from the server 205 to the client #1702 (step 835).
An acknowledge message ACK is sent from the client #1702 to the client #2706 (step 840). The videoconference session (media session) takes place between the two nodes (clients #1802 and #2806) (step 845).
The server 205 initially checks to see if the requesting user (client #1802) is registered with the server 205 and it also checks to see if the user that is being called (client #2806) is registered with the server 205 (step 850).
The server 205 determines the location of each user on the network (step 855) and determines if there is a low bandwidth WAN link (e.g., WAN 250) connecting their two locations (if different) (step 860).
If there is not a low bandwidth link WAN connecting the two locations together, the server 205 proceeds with the call (step 865). However, if there is a low bandwidth link between the two users, then the method proceeds to step 870.
At step 870, the server 205 checks the policy on videoconference sessions on the WAN 250; this basically translates into “X sessions can take place at a maximum bit rate of Y”. The server 205 checks for availability based on this policy (step 875). If there is no availability, then the server 205 rejects the INVITE request by sending any of the following messages, “600—Busy Everywhere”, “486—Busy Here”, “503—Service Unavailable”, or “603—Decline” (step 880), and the method is terminated (without continuation to step 815 of the method of
An INVITE request is sent from client application 1998 to SIP module 304d within the videoconference server 205 (step 903). The SIP module 304d decodes the message and forwards the INVITE requires to the session manager 320 (step 906). The session manager 320 checks the active session database 312, the member database 314, and the policy database 999 within the network architecture database 316 to ensure that the session can be correctly set up (steps 909, 912, and 915, respectively). If the session can be correctly set up, then the active session database 312, the member database 314, and the policy database 999 transmit an OK message to the session manager 320 (steps 918, 921, and 924). Once this verification process is completed, the videoconference server 205 will notify other videoconferencing servers of the change in system status (step 927 and 930).
The session manager 320 will forward an INVITE message to the SIP module 304d (step 933) which will then forward the INVITE message to client application 2997 (step 936). Upon receiving the INVITE message, client application 2997 will respond to the SIP module 304d with a 180 Ringing message that indicates that the SIP module 304d has received the INVITE message (step 939). The 180 Ringing message is received by the SIP module 304d, decoded and then forwarded to the session manager 320 (step 942). The status of the client is updated (steps 945, 948, 951, 954, 957, and 958) in each of the databases shown in
The 180 Ringing message is forwarded from the session manager 320 to client application 1998 (step 960 and 963). A 200 OK message is then sent from client application 2997 to the SIP module 304d (step 966) and forwarded from the SIP module 304d to the session manager 320 (step 969). The 200 OK message indicates that client application 2997 is accepting the invitation for the videoconference session.
The status of the client is updated (steps 972, 975, 978, 981, 984, and 985) in each of the databases shown in
A description will now be given of a session operational scenario corresponding to a multicast videoconference session set up, according to an illustrative embodiment of the present invention. To provide multicast session set up, the Session Description Protocol (SDP) is used. The SDP protocol is able to convey the multicast address and port numbers.
The multicast session setup is similar to the unicast session setup except that a multicast address is required. The multicast address is allocated by the MADCAP server 215 in the network.
An INVITE request is sent from the client #11002 to the server 205 (step 1010). A MADCAP request is sent from the server 205 to the MADCAP server 215 (step 1015). An acknowledge message ACK is sent from the MADCAP server 215 to the server 205 (step 1020). The INVITE request is forwarded from the server 205 to the client #21006 (step 1025).
A 180 ringing message is sent from the client #21006 to the server 205 (step 1030). The 180 ringing message is forwarded from the server 205 to the client #11002 (step 1035).
A 200 OK message is sent from the client #21006 to the server 205 (step 1040). The 200 OK message is forwarded from the server 205 to the client #11002 (step 1045).
An acknowledge message ACK is sent from the client #11002 to the client #21006 (step 1050). The videoconference session (media session) takes place between the two nodes (clients #11002 and #21006) (step 1055).
A description will now be given of a session operational scenario corresponding to the cancellation of a videoconference session, according to an illustrative embodiment of the present invention. The CANCEL message is used to terminate pending session set up attempts. A client can use this message to cancel a pending videoconference session set up attempt the client had earlier initiated. The server forwards the CANCEL message to the same locations with pending requests that the INVITE was sent to. The client should not respond to the CANCEL message with a “200 OK” message. If the CANCEL message is unsuccessful, then the session terminate sequence (i.e., BYE message) can be used.
An INVITE request is sent from the client #11102 to the server 205 (step 1110). The INVITE request is forwarded from the server 205 to the client #21106 (step 1115).
A 180 ringing message is sent from the client #21106 to the server 205 (step 1120). The 180 ringing message is forwarded from the server 205 to the client #11102 (step 1125).
A CANCEL message is sent from the client #11102 to the server 205 (step 1130). The CANCEL message is forwarded from the server 205 to the client #21106 (step 1135).
A description will now be given of a session operational scenario corresponding to the termination of a videoconference session, according to an illustrative embodiment of the present invention.
The client #11202 decides to discontinue a call with the client #21206. Thus, the client #11202 sends a BYE message to the server 205 (step 1210). The server 205 forwards the BYE message to client #21206 (step 1220).
The client #21206 sends a 200 OK message back to the server 205 indicating it (client #21206) has disconnected (step 1230). The server 205 forwards the 200 OK message to client #11202 indicating a successful disconnect (step 1240).
The client #11302 decides to discontinue a call with the client #21306 and the client #31308; this does not tear down the session between the client #21306 and the client #31308.
The client #11302 sends a BYE message to the server 205 (step 1310). The server 205 interprets the BYE message and understands that the client #21306 and the client #31308 are involved in the videoconference session with the client #11302 and forwards the BYE message to both client #21306 and client #31308 (steps 1320 and 1330).
The client #21306 sends a 200 OK message back to the server 205 (step 1340). The server 205 forwards the 200 OK message back to client #11302 (step 1350). The client #31308 sends a 200 OK message back to the server 205 (step 1360). The server 205 forwards the 200 OK message back to client #11302 (step 1370).
The client #11402 decides to discontinue the call with the client #21406 and the client #31406; this does not tear down the session between the client #21406 and the client #31406.
The client #11402 sends a BYE message to the server 205 intended for the client #21406 (step 1410). The server 205 forwards the BYE message to the client #21406 (1420). The client #11402 sends a BYE message to the server 205 intended for client #31406 (1430). The server 205 forwards the BYE message to the client #31406 (step 1440).
The client #21406 sends a 200 OK message back to the server 205 (step 1450). The server 205 forwards the 200 OK message back to the client #11402 (step 1460). The client #31408 sends a 200 OK message back to the server 205 (step 1470). The server 205 forwards the 200 OK message back to the client #11402 (step 1480).
In addition to the previous examples described with respect to
A description will now be given of operation scenarios corresponding to resolution and frame rate adjustment, according to an illustrative embodiment of the present invention. Videoconferencing involves transmitting live, two-way interactive video between several users at different locations on a computer network. Real-time interactive video requires transmission of large amounts of information with constrained delay. This requires that the computer network that the videoconference system is tied to must be able to provide an adequate amount of bandwidth and quality of service for each user involved in the session. Bandwidth can be a limited resource at times and quality of service cannot always be guaranteed in all networks, therefore some limitations will exist. In a private corporate network, it is possible to guarantee quality of service, but it is not always possible to guarantee large amounts of bandwidth.
The basic corporate computer network infrastructure includes several high speed local area networks (LANs) connected together through low speed links (see, e.g.,
Recent advances in quality of service over IP based networks are now providing a means for allowing other types of information to be transmitted across these networks. This opens the door for transmitting real-time information (i.e., audio and video) across the infrastructure in addition to the non-real-time data traffic. Video conferencing services that take advantage of network quality of service are well suited to overlay onto this infrastructure. It is now possible that two users at two different geographic locations can take place in a real-time videoconference session. One disadvantage of a videoconference session is that the transmission of real-time video can consume an extremely large amount of bandwidth and easily deplete available network resources. The bit rates of real-time video transmitted across a network mainly depend on the video resolutions and compression algorithms used. Typically, one videoconference session between two, three, or four users at different geographic locations can be properly supported on a network with a reasonable amount of bandwidth. However, it has been the case that, in general, additional users beyond four in a videoconference session could not be supported nor could a second videoconference session be supported due to bandwidth constraints. The limiting factors of the videoconference system are the low speed long haul links between the geographic locations.
One possible solution is to increase the bandwidth of the long haul links between the two geographic locations in order to support more users in the system. The drawback to this approach is that the bandwidth is very expensive. A second solution is to have a system where only a limited amount of users (i.e., the active users) in the videoconference session are allowed to transmit at a high resolution and high bit-rate, and the remaining users (i.e., the passive users) in the session can only transmit at a limited bit-rate and limited resolution. The videoconference session organizer will have control of which users will transmit in high resolution and which users will transmit in low resolution. If a user is not actively talking or interacting in the session, then there is no need to send their video in high resolution. Such an approach can provide a tremendous amount of savings in bandwidth.
Referring ahead to the videoconference client application 1800 of
A description will now be given of messages corresponding to resolution and frame rate adjustment, according to an illustrative embodiment of the present invention. In particular, an MSG_WINDOW_SWITCH message and a MSG_ADJUST_CODEC message will be described.
The MSG_WINDOW_SWITCH message is sent from the client to the server indicating a switch between an active user and a passive user; that is, the active user becomes passive, and the passive user becomes active. The videoconference server will acknowledge this request with the client.
The MSG_ADJUST_CODEC message is sent from the server to each client. The MSG_ADJUST_CODEC message will indicate to the client what resolution (i.e., CIF or QCIF) and frame rate the client should be sending. The MSG_ADJUST_CODEC message is acknowledged by each client.
A MSG_WINDOW_SWITCH message is sent from the client 11504 to the server 205 (step 1520). An acknowledge message ACK is sent from the server 205 to the client 11504 (step 1525).
A MSG_ADJUST_CODEC (low) message is sent from the server 205 to client 11504 (step 1530). An acknowledge message ACK is sent from client 11504 to the server 205 (step 1535).
A MSG_ADJUST_CODEC (high) message is sent from the server 205 to the client 21506 (step 1540). An acknowledge message ACK is sent from the client 21506 to the server 205 (step 1545).
A MSG_ADJUST_CODEC (low) message is sent from the server 205 to the client 31508 (step 1550). An acknowledge message ACK is sent from the client 31508 to the server 205 (step 1555).
A MSG_ADJUST_CODEC (low) message is sent from the server 205 to the client 41510 (step 1560). An acknowledge message ACK is sent from the client 41510 to the server 205 (step 1565).
A “send at low bit-rate/resolution” message is sent from the client 11602 to network router 1606 (step 1620). A “send at high bit-rate/resolution” message is sent from the client 31608 to network router 1606 (step 1625). A “send at low bit-rate/resolution” message is sent from the client 21604 to network router 1606 (step 1630). A “send at high bit-rate/resolution” message is sent from the client 41610 to network router 1606 (step 1635).
Data is sent from the network router 1606 to the client 21604, the client 31608, the client 11602, and the client 41610, using the multicast address (steps 1640, 1645, 1650, and 1655, respectively).
Proceeding to
Data is sent from the network router 1606 to the client 21604, the client 31608, the client 11602, and the client 41610, using the multicast address (steps 1740, 1745, 1750, and 1755, respectively).
A description will now be given of a client application architecture, according to an illustrative embodiment of the present invention. The client application is responsible for interacting with a user, exchanging of multimedia content with other client applications and for managing calls with the server application. Moreover, it is to be appreciated that the client application is also capable of including server functionality within itself.
The videoconference client application 1800 includes the following four basic functional entities: a multimedia interface layer 1802; codes 1804 (audio codecs 1804a & video codecs 1804b); a network entity 1806; and a user interface 1808.
The multimedia interface layer 1802 is the main controlling instance of the videoconference client application 1800. All intra-system communication is routed through and controlled by the multimedia interface layer 1802. One of the key underlying features of the multimedia interface layer 180 is the ability to easily interchange different audio and video codecs 1804. In addition to this, the multimedia interface layer 1802 provides an interface to the Operating System (OS) dependent user input/output entity and network sub-systems. The multimedia interface layer 1802 includes a member database 1820, a main control module 1822, an audio mixer 1899, and an echo cancellation module 1898.
The user interface 1808 provides the point of interaction for an end user with the videoconference client application 1800. The user interface 1808 is preferably but not necessarily implemented as an OS dependent module. Many graphical user interfaces are dependent on the particular OS that they are using. The four major functions of the user interface 1808 are video capture, video display, audio capture, and audio reproduction. The user Interface 1808 includes an audio/video capture interface 1830, an audio/video playback module 1832, a member view module 1834, a chat module 1836, and user selection/menus 1838. The audio/video capture interface 1830 includes a camera interface 1830a, a microphone interface 1830b, and a file interface 1830c. The audio/video playback module 1834 includes a video display 1832a, an audio playback module 1832b, and a file interface 1832c.
The network entity 1806 represents the communication sub-system of the videoconference client application 1800. The functions of the network entity 1806 are client to server messaging that is based on Session Initiation Protocol (SIP) and the transmission and reception of audio and video streams. The network entity 1806 also includes basic security functions for authentication and cryptographic communication of the media streams between clients. The network entity 1806 includes a security module 1840, a messaging system 1842, a video stream module 1844, an audio stream module 1846, and IP sockets 1848a-c.
The audio codecs 1804a and the video codecs 1804b are the sub-systems that handle the compression and decompression of the digital media. The interfaces to the codecs should be simple and generic in order to make interchanging them easy. A simple relationship between the multimedia interface layer 1802 and the codecs 1804 is defined herein after as an illustrative template or guide for implementation. The audio codecs 1804a and video codecs 1804b each include an encoder 1880 and a decoder 1890. The encoder 1880 and decoder 1890 each include a queue 1895.
The videoconference client application 1800 interfaces with, at the least, the videoconference server 205 and other clients 1870.
A description will now be given of the member database 1820 included in the multimedia interface layer 1802 of
A description will now be given of the main control module 1822 included in the multimedia interface layer 1802 of
The main control module 1822 is a very important part of the multimedia interface layer 1802. The main control module 1822 functions as the central management sub-system and provides the following key functions: synchronization mechanism for audio and video decoders and playback; connects destination of a decoder to screen or to file for recording purposes; and application layer Quality of Service.
The synchronization of audio and video playback is crucial for an optimal videoconferencing user experience. In order to accurately synchronize the two media streams, timestamps will need to be used and transmitted with the media content. Real Time Protocol (RTP) provides a generic header for including timestamps and sequence numbers for this purpose. The timestamps provided are NOT intended to synchronize the two network node clocks, but are intended to synchronize the audio and video streams for consistent playback. These timestamps will need to be derived from a common clock on the same node at the time of capture. For example, when a video frame is captured, the time when the video frame was captured must be recorded. The same applies to audio. Additional details and guidelines for using RTP are described elsewhere herein.
The function of the main control module 1822 in synchronizing the audio and video is to make the connection between the network entity 1806 and the codecs 1804 in order for proper delivery of the metadata (including timestamps and sequence numbers) and multimedia data. If packets are late, then they can be dropped before or after decoding depending on the current conditions of the system. The RTP timestamps are subsequently used to create the presentation and playback timestamps.
The main control module 1822 is also responsible for directing the output of the audio and video decoders 1890 to the screen for playback, to file for recording, or to both. Each decoder 1890 is treated independently, therefore this allows in an example situation for the output of one decoder to be displayed on the screen, the output of a second decoder to be recorded in a file, and the output from a third decoder to go both to a file and to the screen simultaneously.
In addition to the above-mentioned responsibilities, the main control module 1822 is also involved in application layer quality of service. The main control module 1822 gathers information regarding packet drops, bytes received and sent, and acts accordingly based on this information. This could involve sending a message to another client or to the videoconference server 205 to help remedy a situation that is occurring in the network. Real Time Control Protocol (RTCP) can be used for reporting statistics and packet losses, and can also be used for application specific signaling.
A description will now be given of interfaces available to the sub-systems of the videoconference client application 1800, according to an illustrative embodiment of the present invention. The interfaces include the points of interaction with the user interface 1808, the network entity 1806, and the codecs 1804. The user interface 1808 provides functions for receiving captured audio and video along with their corresponding timestamps. In addition to this, functions must be provided for sending audio and video to the user interface 1808 for display and reproduction. The network entity 1806 interface provides functions for signaling incoming and outgoing messages for session control and security. The audio and video codecs 1804a,b provide a basic interface for configuration control as well as to send and receive packets for compression or decompression.
A description will now be given of the audio and video codecs 1804a,b, according to an illustrative embodiment of the present invention.
There are several audio and video codecs available for use in videoconferencing. Preferably but not necessarily, the codecs employed in accordance with the present invention are software based. According to one illustrative embodiment of the present invention, H.263 is used for video compression and decompression due to the processing power constraints of typical desktop computers. As desktop computers become more powerful in the future, the ability to use a more advanced codec such as H.26L can be realized and taken advantage of. Of course, the present invention is not limited to the preceding types of codecs and, thus, other types of codecs may be used while maintaining the spirit and scope of the present invention.
A description will now be provided of the interface to the codecs 1804a,b, according to an illustrative embodiment of the present invention. The description will encompass a Datain function, callback functions, and codec options. The interface to the codecs 1804a,b should be flexible enough and defined in a general sense to allow interchangeability of codecs as well as to allow the addition of new codecs in the future. The proposed interface for implementing this flexible and general interface is a very simple interface with a limited number of functions provided to the user.
The DataIn function is simply used to store a frame or a packet of the encoder or decoder class.
In order to provide a simple connection between the multimedia interface layer 1802 and the multimedia codecs 1804, the data output function should be implemented as a callback. The multimedia interface layer 1802 sets this callback function to the input function of the receiving entity. For example, when the codec has completed encoding or decoding a frame, this function will be called by the codec in order to deliver the intended information from the encode or decode process. Due to the constraints that the codec is not able to do anything while in this callback, this function should return as quickly as possible to prevent waiting and unnecessary delays in the system. The only additional wait that should be performed in this function should be a mutex lock when accessing a shared resource.
The range of options available to different types of codecs will vary. In order to satisfy the requirements for managing these options, a simple interface should be used. A text-based interface is preferred (but not mandated) because of the flexibility that it offers. There should be a common set of commands such as START and STOP, and then codec specific commands. This method offers a simple interface, but adds additional complexity to the codec because a simple interpreter is required. As an example, an Options function can be generic enough to read and write options.
Example: Result=Options(“start”); Result=Options(“resolution=CIF”); etc.
For example, some of the common options between codecs should be standardized as follows: start; stop; pause; quality index (0-100); and resolution.
The quality index is a factor that describes the overall quality of the codec as a value between 0% and 100%. It follows the basic assumption that the higher the value the better the video quality.
An initialization step (Init) is performed to initialize the decoder 1890 (step 1910). A main loop is executed, that waits for a start or exit command (step 1920). If an exit command is received, then the method is exited (step 1922) and a return is made to, e.g., another operation (1924).
Data is read out of an input queue 1895 or a wait condition is imposed if the input queue 1895 is empty (step 1930). The data, if read out at step 1930, is decoded (step 1940). The “data out callback” 1995 is provided to step 1920.
A description will now be given of the communications employed by the network 200, according to an illustrative embodiment of the present invention. The description supplements that provided above with respect to network communications.
The messaging system 1842 (included in the network entity 1806 of
There are several different protocols that govern the functionality of the videoconference client application 1800. For example, Session Initiation Protocol (SIP), Real Time Protocol (RTP), Real Time Control Protocol (RTCP), and Session Description Protocol (SDP) may be employed.
The purpose of Session Initiation Protocol (SIP) is session management. SIP is a text based application layer control protocol for creating, modifying and terminating multimedia sessions with one or more participants on IP based networks. SIP is used between the client and the server to accomplish this. SIP is described further above with respect to the videoconference server 205.
Real Time Protocol (RTP) is used for the transmission of real-time multimedia (i.e., audio and video). RTP is an application layer protocol for providing additional details pertaining to the type of multimedia information it is carrying. RTP resides above the transport layer and is usually carried on top of the User Datagram Protocol (UDP). The primary function of RTP in the client application will be for transporting timestamps (for audio and video synchronization), sequence numbers, as well as identify the type of payload it is encapsulating (e.g., MPEG4, H.263, G.723, etc.).
Real Time Control Protocol (RTCP) is part of the RTP standard. RTCP is used as a statistics reporting tool between senders and receivers. Each videoconference client application 1800 will gather their statistics and send them to one another as well as to the server 205. The videoconference server 205 will record information about problems that may have occurred in the session based on this data.
The main purpose of SDP is to convey information about media streams of a session. SDP includes, but is not limited to, the following items: session name and purpose; time the session is active; the media comprising the session; information to receive the media (i.e., addresses, ports, formats, etc.); type of media; transport protocol (RTP/UDP/IP); the format of the media (H.263, etc.); multicast; multicast address for the media; transport port for the media; unicast; and remote address for the media.
The SDP information is the message body for a SIP message. They are transmitted together.
A further description will now be given of the user interface 1808 of
Referring again to
The web cam should be supported through either the USB or Firewire (IEEE1394) interface using the Video For Windows (VFW) Application Programming Interface (API) provided by the Windows operating system or through an alternative capture driver used under a different operating system such as Linux. Of course, the present invention is not limited to the preceding interfaces, operating systems, or drivers and, thus, other interfaces, operating systems, and drivers may also be used, while maintaining the spirit and scope of the present invention.
The member view module 1834 is used to show the members participating in the ongoing call. The initiator (i.e., Master) of the call can either drop unwanted members or select active members. Every member can select one or more members for a private chat message exchange. In addition, the status of a member is signaled in the member view module 1834. A member can then set their own status to, e.g., “Unavailable”, to signal the other they are currently not available but will be back soon.
In addition to the video stream, every member has the opportunity to send chat messages to either all or only some other members using the chat module 1836. The messages are displayed in the chat view and edited in the chat edit view. A scrollbar allows viewing of older messages.
A description will now be given of operational scenarios for the client application 1800, according to an illustrative embodiment of the present invention. The following description is simply a basic guideline of some of the features of the client application 1800 and is not intended to represent a complete list of features. The description will encompass login, initiation of a call, acceptance of a call, and logoff.
The login is done when the client application 1800 is initially started. The login can be done automatically based on the login name provided to the operating system at startup, or a different interface can be used that is independent of the login. It depends on the preferred method of authentication for the network that is currently used and how policies are administrated. The simplest method would be to use the same login name as that used in the windows operating system to keep naming consistent and also to have the ability to reuse existing user databases (if applicable).
To initiate a call, the client application 1800 will query the server 205 for a list of available candidates. The client can select the users he or she wishes to engage in a videoconference session. A session will be setup as unicast when two participants are involved; otherwise, when more than two participants are involved the session is set up as a multicast session.
Once the user is invited to a call, a message showing the name of the initiator is displayed on their screen. The user can then either accept or reject the call. If the user accepts the call, then the client application 1800 sends an accept (or acknowledgement) message to the server 205. The server 205 then informs every member currently participating in the call about the new member. If the user declines the call by sending the cancellation message to the server 205, then all other members are also informed about that event.
The logoff will remove the user from the member database 314 included in the database entity 302 of the videoconference server 205. A BYE message is sent to each participating client of the session. This can be done either through multicast or unicast. Multicast is the preferred method for sending this message.
For illustrative purposes, the following description of the present invention is made with respect to the illustrative network 200 shown in
The initiating client 2602 and the destination clients 2604, 2606 are initially registered with the server 205 (step 2610). According to a preferred embodiment of the present invention, the initiating client 2602 and the destination clients 2604, 2606 are registered with the server 205 using the IP address of the corresponding client computer; the IP address may be associated with a unique identifier such as an email address or a user name.
A request for a videoconference session is transmitted from the initiating client 2602 to the server 205 (step 2615). The request specifies the destination clients 2604, 2606 that are to take part in the videoconference session with the initiating client 2602.
The server 205 transmits a multicast IP group address to the initiating client 2602 (step 2620). The initiating client 2602 transmits an Internet Group Multicast Protocol (IGMP) join request to IP router #1240 (step 2625).
The server 205 transmits the multicast IP group address and a request to join to each of the destination clients 2604, 2606 (step 2630). Each of the destination clients 2604, 2606 transmits a notification of acceptance or rejection to the server 205 (step 2635). The notification is forwarded from the server 205 to the initiating client 2602 (step 2640).
An IGMP join request is transmitted from the destination clients 2604, 2606 to IP router #2245 (step 2642) and forwarded from IP router #2245 to IP router #1240 (step 2645).
Content (audio and/or video) is transmitted from the initiating client 2602 to IP router #2245 (step 2647) and forwarded from IP router #2245 to each of the destination clients 2604, 2606, using the multicast IP group address (step 2650). Further, content is transmitted from the destination clients 2604, 2606 to IP router #2245 (step 2655).
It is to be appreciated that the references to IP router #1240 and IP router #2245 are made to illustrate the role of the network routing architecture in supporting the method of the present invention. Accordingly, other network routing devices may be used in place of and/or in addition to those described herein, while maintaining the spirit and scope of the present invention.
It is to be further appreciated that the present invention is not limited to only videoconference sessions, but may also be used for setting up a single-direction multicast session to multiple clients using the same protocol. In this scenario, the application should limit the receiving clients from being allowed to send on the multicast group IP address.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present invention is not limited to those precise embodiments, and that various other changes and modifications may be affected therein by one skilled in the art without departing from the scope or spirit of the invention. All such changes and modifications are intended to be included within the scope of the invention as defined by the appended claims.
This is a non-provisional application claiming the benefit under 35 U.S.C. § 119 of provisional application Ser. No. 60/341,797, entitled “VIDEO CONFERENCING CALL SET UP METHOD”, filed on 15 Dec. 2001, Attorney Docket No.: PU010307, which is incorporated by reference herein. This application is also related to commonly assigned provisional application Ser. No. 60/366,331, entitled “VIDEOCONFERENCE SYSTEM ARCHITECTURE”, filed on 20 Mar. 2002, which is incorporated by reference herein. This application is also related to commonly assigned provisional application Ser. No. 60/341,720, entitled “VIDEO CONFERENCING BANDWITH SELECTION MECHANISM”, filed on 15 Dec. 2001, which is incorporated by reference herein, and commonly assigned provisional application Ser. No. 60/341,671, entitled “QUALITY OF SERVICE SETUP ON A TIME RESERVATION BASIS”, filed on 15 Dec. 2001, which is incorporated by reference herein, and commonly assigned provisional application Ser. No. 60/341,800, entitled “VIDEOCONFERENCE SESSION SWITCHING FROM UNICAST TO MULTICAST”, filed on 15 Dec. 2001, which is incorporated by reference herein, and commonly assigned provisional application Ser. No. 60/341,799, entitled “METHOD AND SYSTEM FOR PROVIDING A PRIVATE CONVERSATION CHANNEL IN A VIDEOCONFERENCING SYSTEM”, filed on 15 Dec. 2001, which is incorporated by reference herein, and commonly assigned provisional application Ser. No. 60/341,801, entitled “VIDEOCONFERENCE APPLICATION USER INTERFACE”, filed on 15 Dec. 2001, which is incorporated by reference herein, and to commonly assigned provisional application Ser. No. 60/341,819, entitled “SERVER INVOKED TIME SCHEDULED VIDEO CONFERENCE”, filed on 15 Dec. 2001, which is incorporated by reference herein.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US02/39868 | 12/12/2002 | WO | 5/5/2005 |
Number | Date | Country | |
---|---|---|---|
60341797 | Dec 2001 | US | |
60341801 | Dec 2001 | US | |
60341720 | Dec 2001 | US | |
60341671 | Dec 2001 | US | |
60341819 | Dec 2001 | US | |
60341800 | Dec 2001 | US | |
60341799 | Dec 2001 | US | |
60341331 | Dec 2001 | US |