1. Field of the Invention
The invention relates to multimedia communications via a network. More specifically, the invention relates to a method of unbinding call control from device control policy and media services. One embodiment of the invention is particularly suited for videoconferencing.
2. Description of the Related Art
As Voice over IP (VOIP) telephones become increasingly common, there is an increased interest in running video on those networks. This may require two different devices, for example, an IP phone and a videoconferencing endpoint. One can attempt to place a video only call between users already in a voice call, but this requires two complete call control devices with separate addresses, administration control, sever infrastructures, etc.
U.S. Pat. No. 6,750,896, by McClure, describes a system wherein video calls between video devices are controlled by presenting video call options and receiving inputs of video call information through a telephone network. A video call application associated with a phone server receives video call information and provides the information to a video launch application that controls video devices accordingly. In one embodiment, IP telephones provide video call options such as initiating and terminating video calls through an IP telephone server to a video network platform using XML formatted data. The video network platform provides video call options based on user code information to simplify the IP telephone interface. The video network platform performs the functions represented by the video call information to establish and terminate video calls as appropriate.
One aspect of an embodiment of the invention is method of establishing a media call wherein a data stream contains a call control channel and one or more media channels. A network connection between a call control entity and a far end device is established wherein the connection conveys the call control channel. This connection typically utilizes a control protocol such as SIP or H.323, which are know to those skilled in the art. A network connection is established between the call control entity and a media entity, typically using an XML protocol. This connection between the call control entity and the media entity is used to prepare and direct the media entity to receive incoming media. The call control device directs the far end device to establish a media channel network connection between the far end device and the media entity, typically using RTP. By separating the call control channel from the media channel(s), a non-media device, such as an IP telephone can be integrated into a media conferencing experience. This allows the user's station to appear as a single destination device with a single address and a single point of administration.
The following definitions and abbreviations are used in the disclosure:
Associations—The binding between peer binding companions.
Media entity—An element of a decomposed videoconferencing system that may aggregate any of the several types of devices and services supported by the present invention. A media entity typically generates and/or decodes a RTP stream.
Call Control Entity—The entity responsible for managing various call setup parameters at an end of a multimedia call, typically using standard call control protocols such as H.323 or SIP. The call control entity can be any network based device such as PC based client, a stand alone appliance phone, PDA, or cell phone. The Call Control entity may be viewed as a network proxy or bridge for communicating information from a far end to the devices within the setup association Control Point. The call control entity is also typically used to query the capabilities of associated media entities, select an appropriate media entity for a given media call, and control the media entity.
Logging Entity—The entity responsible for handling synchronization of logging from various media entities.
Event Management—Media entities may generate asynchronous events. The protocol disclosed herein provides event management, i.e., it allows devices to register for and receive events.
Network Application Message Framing—Every message must be “framed” so that the receiver of the message can do first pass validity checking on the message. The lowest layer of the present protocol has a framing mechanism that permits simultaneous and independent exchanges of messages between peers and quick parsing of the message.
Reply Codes—Reply codes are messages with specific information regarding a previous command.
Session—The time from the underlying transport protocol connection to disconnection. In TCP, from connection to BYE.
TLS—Transport Layer Security.
ALG—Application Level Gateway.
SCTP—Stream Control Transmission Protocol (IETF RFC 2960).
SIP—Session Initiation Protocol.
The present disclosure provides a way to add video thereby extending the audio-only capability of a call control entity such as a voice phone. This allows the user's station to appear as a single device with a single address and a single point of administration. This allows a media entity to act essentially as a peripheral to a call control entity such as a standard VoIP device.
According to the teaching herein, call control is unbound from device control policy and media services. Also disclosed is an application layer device control protocol that allows reliable exchanges of control messages, media stream descriptions, configuration and state information between peer call control entities and media entities.
The call control entity and a media entity can communicate over a network connection. The media entity, for example a video codec (coder/decoder), can be implemented as a computer application, or as one of an array of DSP accelerated devices such as an integrated DSP and flat panel display, a snap-on panel for the back of an IP phone, or a traditional set-top or rack mount CODEC. The call control entity and media entity communicate using a protocol, as describe below. Decomposing a traditional videoconferencing device into separate media devices allows devices that are uniquely and historically suited for their various purposes, such as an audio telephone, to be integrated into the videoconferencing experience.
The systems described herein uses a connection-oriented control protocol that allows a peer connection between two media devices to exchange device and media control commands and responses using textual XML messages. Layered on top of the control protocol are media entity services for controlling various aspects of the media. The protocol supports standard device control semantics and media control semantics that can be signaled between the near-end call control entity and far-end call control entity. These semantics include, for example, media stream starting, stopping, pausing, refreshing, muting; camera control; and security and encryption. The protocol also supports semantics that allow synchronization between various media devices for services such as logging and provisioning.
Examples of media services are, for example, live video feed such as in video conference, content video such as a video presentation, audio media, camera control, logging, provisioning, etc. Systems embodying the teachings herein can aggregate several services within a single device, for example, call control and logging in a single device, and audio capture and audio encoding in a single device. However, a media entity need not have all of the afore-mentioned services.
An example of a media exchange embodying aspects of the present disclosure is illustrated in
The protocol can be based on XML Schema. This provides the ability to extend Schema without affecting existing implementations. Using XML for describing messages, commands, responses, properties, configuration information, and logging information allows for use of standard web technology like XSLT and XCAP for controlling a media device. XML allows platform developers to reuse already existing XML parsing libraries or use special-built XML parsers for a particular service. Also, XML schemas allow platform developers the ability to choose validating parsers, which guard against syntax vulnerabilities that exist in other text-based network protocols.
Media entities require reasonable security to prevent attacks on them, for example, in the form of media eavesdropping, barge-in, device hijack for DDos attacks, unauthorized use, and playback attacks. In the case of a secure media, the devices must be able to securely pass back and forth the stream keys between the call control entity and the media entities. Some form of authentication for binding between media entity devices is preferably used. A variety of security authentication schemes known to those of skill in the art are supported, for example: One-Time-Password Mechanism (RFC 2444); Plaintext user/password (RFC 2595); and anonymous binding (RFC 2245).
Logical services are associated with various media entities. For example, a media entity might provide a service for transmitting live video, such as a video conference feed, and a service for transmitting content video such, such as a recorded video presentation. Most media entity services support media streams in some form, for example they can: create transmit channel and receive streams independently (logical independence); create transmit and receive streams in any order (temporal independence); create transmit and receive streams “simultaneously”, etc. These services are addressed within particular messages within the protocol. The services are defined using XML Schemas.
Associations describe the mapping between the control entities and media entities. Associations have two dimensions. The first dimension reflects the control point to media entity mapping, for example: one call control device to one media device; one call control device to many media devices; or many call control devices to many media devices. The second dimension of association is duration. There are two types of duration; promiscuous and monogamous. A typical example of a promiscuous association is a content encoder in a conference room. In this mode, various users would connect their content source to the encoder for a short period of time and then leave. An example of monogamous association would be a desktop phone controlling a video media entity on the same desktop. The difference between these two associations requires that the association and authentication models be relatively lightweight. Associations have time durations from a single session to infinite.
Standard network device management such as SNMP is typically too heavy for some lightweight media entities according to some embodiments of the invention. It is desirable that some device management be present. It is unlikely that a modem enterprise network manager would allow networked devices onto their network in this day of worms, Trojan horses and viruses without being able identify and manage such devices from a central location. This requirement is extended for ISP and IP Centrex-like environments where these devices are actually owned by third parties. The approach, according to the present invention, is to view the provisioning and management information present on the device as a single unified XML document. This “document” is reflected in an XML schema that describes the tree. The XML syntax for modifying this “document” is described in XCAP (XML Configuration Access Protocol). XCAP allows a client to read, write and modify device and service configuration data, represented in XML format on the media device.
The protocol of the present invention provides two logging services: a LogServer service that might be a front end to a WINDOWS® event log or syslog, and a LogClient service that produces logging information. The LogServer Service allows formatted messages to be sent to it. The service synchronizes messages from various sources into a single log. This single point is then exposed to allow LogClients to read the synchronized logs. The LogServer service supports an interface that looks similar to Log4J that allows various log clients to read logs separated by service as well as message severity.
The transport layer is responsible for the actual transmission of requests and responses over network transports. This includes determination of the connection to use for a request or response in the case of connection-oriented transports. The transport allows devices to communicate using reliable connection-oriented (ex: TCP, SCTP) transport protocols. When entities use a connection-oriented protocol (such as TCP or SCTP) to send a request, they typically originate their connections from an ephemeral port. The transport allows easy transversal of firewalls and gateways and allows reuse and sharing of the connection mechanism. According to some embodiments, the connection sharing mechanism allows entities to reuse existing connections for requests and responses originated from either peer in the connection; allows entities to reuse existing connections with closely coupled nodes that act as a single system entity; and prevents unauthorized hijacking of other connections.
In using a connection-oriented transport such as TCP or SCTP, individual messages must be framed within the packet stream. The framing information should allow the lowest level host application code to weakly validate the message. A message frame must contain: an easily identifiable (and unique) starting character sequence; the service that the message is bound for; a non-monatomic increasing message number that uniquely identifies this message across all services; a non-monatomic increasing sequence number that uniquely identifies this message within the particular service; a continuation identifier if the message runs across physical packet boundaries; a payload size that specifies the exact number of octets in the payload; an easily identifiable ending character sequence; the sender; TTL for the message; and version.
A system and method has been shown in the above embodiments for the effective implementation of media devices over IP. While various preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, it is intended to cover all modifications and alternate constructions falling within the spirit and scope of the invention, as defined in the appended claims. For example, the present invention should not be limited by software/program, computing environment, specific computing hardware or specific multimedia transmission protocols. Existing and future input/output devices are envisioned within the scope of the present invention.