A media application server may be used in connection with serving media for a variety of different purposes including, for example, audio and/or video conferencing. The media application server may reside on a server system in connection with servicing various media requests in accordance with the particular media and associated operations that may be performed by the media application server. Each media application server generally includes code for performing the particular application logic as well as code for performing media processing operations that may also be performed more generally by other media application servers. In other words, media application servers may perform a common set of media processing operations independent of the particular application logic. In some existing systems, the code for the common set of operations performed by a media application server may be included in each media application server. One drawback with the foregoing is that this may be inefficient due to possibly recoding a same portion of code for different media application servers. Additionally, including the same code portions for common operations in the different media application servers may lead to problems with code maintenance due to the duplicate copies of code.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Techniques are described for media processing. A media processor receives one or more input media streams and provides an output media stream to one or more endpoints. A media controller issues commands to the media processor for controlling the media streams. The media controller and the media processor communicate in accordance with a defined protocol allowing for independent control of each of the media streams.
Features and advantages of the present invention will become more apparent from the following detailed description of exemplary embodiments thereof taken in conjunction with the accompanying drawings in which:
Referring now to
The techniques set forth herein may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, and the like, that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
Included in
It will be appreciated by those skilled in the art that although the server computer and client computer are shown in the example as communicating in a networked environment, the computers may communicate with other components utilizing different communication mediums. For example, the server computer 12 may communicate with one or more components utilizing a network connection, and/or other type of link known in the art including, but not limited to, the Internet, an intranet, or other wireless and/or hardwired connection(s).
Referring now to
Depending on the configuration and type of server computer 12, memory 22 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. Additionally, the server computer 12 may also have additional features/functionality. For example, the server computer 12 may also include additional storage (removable and/or non-removable) including, but not limited to, USB devices, magnetic or optical disks, or tape. Such additional storage is illustrated in
By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Memory 22, as well as storage 30, are examples of computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by server computer 12. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The server computer 12 may also contain communications connection(s) 24 that allow the server computer to communicate with other devices and components such as, by way of example, input devices and output devices. Input devices may include, for example, a keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) may include, for example, a display, speakers, printer, and the like. These and other devices are well known in the art and need not be discussed at length here. The one or more communications connection(s) 24 are an example of communication media.
In one embodiment, the server computer 12 may operate in a networked environment as illustrated in
One or more program modules and/or data files may be included in storage 30. During operation of the server computer 12, one or more of these elements included in the storage 30 may also reside in a portion of memory 22, such as, for example, RAM for controlling the operation of the server computer 12. The example of
The media server application 42 may be used in connection with providing various services in connection with one or more types of media. For example, the media server application 42 may be used in connection with providing audio and/or video conferencing services, media relay services, gateways, and the like. The server 12 may include a multipoint control unit (MCU) with one or more media server applications thereon. The MCU may be used to establish conference calls between multiple participants for converged voice, video and/or data conferences. An MCU can provide audio-only services or any combination of audio, video and data, depending on the capabilities of each participant's terminal and the functionality of the particular MCU's hardware and/or software. It should be noted that the techniques described herein may be used in connection with other media application servers such as, for example, media relay servers.
As will be described in more detail in following paragraphs in connection with the techniques described herein, a media server application may be partitioned into two basic components, a media controller (MC) and a media processor (MP), with an abstraction layer between these components to facilitate communication therebetween. The MC performs signaling and control processing and provides instructions to the MP to perform media processing operations. The MC may be characterized as that portion of the media server application which is customized or tailored for the particular application. The processing performed by the MP may be characterized as a common set of operations for processing and serving the media to a requestor independent of the particular application and logic performed by the MC component. For example, the MP component may perform all operations for sending and receiving a media stream in connection with the particular application. The MC may issue commands for controlling operation of the media streams using the abstraction layer. Also using the abstraction layer, the MP may respond to the MC with response messages and also report any occurrences of asynchronous events to the MC. In one embodiment, the abstraction layer may be implemented using an API (Application Programming Interface) and a protocol which is described in more detail herein.
It should be noted that the client computer 16 may also include hardware and/or software components as illustrated in connection with
Referring now to
Each of the MCs may interface with clients, for example, indirectly utilizing a conference control protocol, directly or indirectly using SIP (Session Interface Protocol), a 1st party call control protocol, and the like. The MP may also communicate with clients using various protocols such as, for example, RTP (Real-time Transport Protocol)/RTCP (Real-time Control Protocol). The MP may also interface as a client with media relay servers, for example, using protocols such as STUN (Simple Traversal of UDP through NAT (Network Address Translation)) and TURN (Traversal Using Relay NAT). The abstraction layer, as described in more detail herein, resides in the MCs 102a and 102b and the MP 104. Each of the MCs 102a and 102b communicate with the MP 104 using a communication connection. In this example, MC 102a may communicate with the MP 104 over 120a and MC 102b may communicate with MP 104 over 120b.
In an embodiment, each of the MCs and/or MP may reside on the same or a different computer system and may communicate using the techniques described herein. In one embodiment in which all of the MCs and the MP resides on the same system, the MC and MP may communicate using API functions and call backs.
As mentioned above, the MP 104 may be used in constructing one or more logical MPs 106a, 106b and 106c. It should be noted that although a number of MCs and logical MPs are included in
A logical MP may service a single MC. An instance of a logical MP may be constructed and utilized by the single MC. The single MC may create and be serviced by one or more logical MPs. For example, with reference to
An MC, such as MC 102a, may issue control and signal commands to the one or more logical MPs, such as logical MP 106a, associated with that particular MC. A logical MP may perform common operations such as mixing multiple audio streams to generate a combined audio stream based on control commands issued by an MC. The logical MPs may also perform encoding and/or decoding operations as instructed by the MC.
As an example with audio conferencing with three participants, in one arrangement, an MC may provide an initial trigger by sending a JOIN or INVITATION message to each of the participants at a scheduled time. Each of the participants may have a client computer connected to the server computer. Each participant may respond with a message from his/her client computer to the MC indicating they will join the conference. The MC may then utilize the techniques described herein to output the appropriate media stream to each of the participants. The MP may combine or mix the incoming audio streams and generate an output stream as appropriate for each participant. Additionally, during the conference, commands may be issued to the server from one or more client computers. For example, a conference may have a few presenters and many passive listeners. The techniques described herein may be used to exclude from a generated audio output stream any input stream from a passive listener, and also include in the generated audio output stream the input stream from only the currently active speaker. During the conference, the active presenter may change and the techniques described herein may be used to appropriately allow the logical MP to notify the MC of the change in active speaker, and have the MC respond by issuing commands to the logical MP servicing the MC to accordingly modify the generated combined output stream to the conference participants.
What will now be described are the structures created and used in connection with the techniques described herein. Reference may be made to particular examples or uses for purposes of illustration of the techniques described herein and should not be construed as a limitation regarding the applicability of these techniques.
Once the MC has received a reply from one or more of the participants, a context structure may be defined. A context may be defined for each set of one or more input streams (e.g., audio and/or video) that interact with each other. In connection with a conferencing example, a first context may be defined for a main conference between all participants. A second context may be defined for a side conference between only two participants who wish to have a side conference while the main conference is going on.
A termination structure may be defined for each of the logical communication endpoints. As described herein with one example, an endpoint may be, for example a single client application on a client computer. The termination structure associates multiple streams that are sent to/received from the same logical endpoint. Such a logical endpoint may also be referred to herein as a termination associated with a termination structure. In one embodiment, all streams that are output and sent to the same termination are synchronized. A logical endpoint or termination may also be, for example, another MC. Referring back to the audio conference example, a single termination structure may be created for each client application on the client computer of each conference participant.
A stream structure may be defined for each single media (e.g., audio, video) that is sent to/received from a single termination. A stream can be full duplex (sent and received) or half duplex (sent or received) or inactive. For each stream, multiple descriptors may be defined. In one embodiment, the following descriptors may be associated with each stream: a local descriptor, a remote descriptor, an ingress (incoming) filter descriptor, and an egress (outgoing) filter descriptor. Collectively, the descriptors associated with a stream may be used to describe the various attributes of the incoming and outgoing streams and how the stream interacts with any other streams of the same context. Referring to the audio conference example, a single stream structure may be defined and associated with the audio stream for each conference participant.
A local descriptor defines the attributes of the ingress stream (e.g., stream received from the endpoint). The local descriptor describes the MP environment or side of the communication. The local descriptor may include, for example, encoding and decoding parameters, transport parameters, port address, transmission speed, and the like.
A remote descriptor defines the attributes of the egress stream (e.g., stream sent to the endpoint). The remote descriptor describes the remote side or the endpoint location. The remote descriptor may include parameters similar to that as described above for the local descriptor except that the parameters apply to the endpoint or termination. If the endpoint represents a file, for example, used for archiving, then the remote descriptor may include the file name and how to access the file.
An ingress filter descriptor defines what terminations receive the associated stream. In one embodiment, the ingress filter descriptor may be optional. If an ingress filter is not specified, then a default behavior may be defined. In one embodiment the default behavior may be that all terminations in the context receive the associated stream. The ingress filter enables muting an ingress stream from all other terminations or particular terminations. For example, in large conferences with only a few presenters and many passive listeners, ingress filters may be used to block ingress streams for all passive listeners and block/open the active presenter as needed. As another example, in an audio conference call, if a participant mutes his/her voice resulting in a command to the MC, the MC may in turn cause the ingress filter descriptor associated with the participant to be accordingly updated.
An egress filter descriptor defines what terminations are selected as ingress streams (source media) for the egress stream of this termination. In addition it defines what media processing, such as switch or mix, may be used. In one embodiment, the egress filter descriptor may be optional. If the egress filter for a stream is not defined, a default behavior may be specified. In one embodiment, the default behavior may be that all terminations in the context are selected, except the stream's own termination (e.g., no loop). In addition, a default media processing option may be defined in accordance with the particular media. For example, a default media processing for voice is mixing and for video is switching, based on active speaker. If active speaker does not contribute any video, then the previous speaker may be selected.
It should be noted that communications over 120 and 120b between the MP 104 and each MC may be two-way communication connections. As described in more detail herein, commands may be sent from the MC to the MP 104 in accordance with a defined messaging protocol and API. The MP 104, or logical MP included therein, may respond to the MC with response messages. The messages originating from the MC may be commands or control messages to manage the structures and descriptors such as, for example, to create a context, modify an existing context or element associated with an existing context. The commands sent from the MC to the MP 104 may be in response to the MC receiving an external message, such as from a conference participant making a modification to an option, a new participant joining an existing conference, and the like. Additionally, the MP 104, or logica MP included therein, may originate messages in the form of asynchronous event reporting to the MC such as, for example, regarding the currently active speaker. This is also described in more detail herein.
Referring now to
Referring now to
What will now be described is an example of an MC-MP communication protocol. It should be noted that the MC may utilize the protocol communicate directly with the particular logical MP in connection with command requests. In connection with this protocol, the MC sends requests to the MP 104, and the MP 104 sends response messages to the MC. The MP 104 also report on particular events to the MC in an asynchronous fashion. If a request from the MC to the MP 104 fails, the MP 104 returns the structures and descriptors to the state that existed prior to execution of the request. As will be described herein, the protocol may include messages directed to the MP component 104 as well as to a particular logical MP. Similarly, the protocol may include messages sent from the MP component 104 to the MC as well as from a logical MP to the MC. In one embodiment, messages exchanged between the MC and the MP 104, or logical MP, may be XML messages although other message formats may be used. It should be noted that a more detailed XML schema that may be used is included in following paragraphs.
Referring now to
A construct MP request 402 initiates an instance of a logical MP based on a description that is included in the request. The information may identify, for example, the type of service to be performed by the logical MP instance being created. An example of a construct MP request 402 may be:
As a result, the MP 104 instantiates and instance of a logical MP. The MP 104 sends a response back to the requesting MC that includes logical MP identifier. An example of such a response message may be:
A destruct MP request 404 destructs or deallocates an active logical MP. Such a request may be sent from the MC to the MP 104 to free resources. As previously described, a “destruction” of a logical MP, or other element includes deallocation of associated resources for reuse. An example of a request message 404 may be:
As a result the MP destructs and frees the resources of logical MP mp1.1 in the example. The MP returns a response to the MC with the logical MP information that shows the status of the logical MP before the request. Such information may include statistics for the duration of the lifetime of the logical MP. Examples of statistics that may be obtained in an embodiment may include, for example, the number of contexts, statistics about each context such as maximum and average number of terminations, maximum and average bandwidth, and the like. An example of a response message sent from the MP to the MC in response to a request of type 404 may be:
A snapshot MP request 406 returns the current status of a logical MP. The response includes a detailed description of state and usage of resources. The snapshot may include, for example, current values for one or more of the statistics described in connection with message type 404. An example of a message of type 406 may be:
As a result the logical MP returns a response with MP logical data that shows the current status of the requested logical MP. An example of a response message of type 406 may be:
A delete context request 408 deletes a context with all its terminations and streams. In one embodiment, a context may be deleted implicitly when the last stream in the context is deleted. Accordingly, in normal operation, a request of type 408 may not be used. An embodiment of the MC may use this request, for example, when there is an immediate need to abort a context. As an example, delete context may be the result of a command from a conference organizer such as when the organizer leaves the conference and does not want the other participants to continue using currently allocated resources for the conference. An example of a request of type 408 may be:
In connection with this particular example, the logical MP destructs and frees the resources of context1 in logical MP mp1.1 and returns a response with information, such as statistical information, about the deleted context. Statistical information returned in an embodiment may include, for example, start time, end time, average bandwidth, lost packets, and the like. The statistical information may be used, for example, for management purposes such as when a user calls a help desk regarding the quality of a specific call. The statistical information may be used in connection with measuring different quality aspects.
It should be noted that if the context is deleted implicitly as a result of deleting a last stream in the context, the logical MP managing that context may fire a context event that includes similar information that may otherwise be returned in connection with the delete context response. An example of a response message of type 408 may be:
In one embodiment, a context may also be constructed implicitly when the first stream is added to the context, such as using the add stream request described below. The context may also be destructed implicitly when the last stream in the context is deleted, such as using the delete stream request as described below.
A move termination request 410 moves a termination from one context to another in a single operation (e.g., vs. delete and add in two steps). In one embodiment, by default a logical MP may preserve all termination attributes except the filters descriptors that by default may be removed. The MC may overwrite termination parameters, including filters, in the move termination command. These changes may be applied immediately after the termination is moved to the new context. As an example, a participant of a conference may move from one conference to another and a move termination request may be used to reflect this conference move. The move command may be characterized as a compound command to delete and add a termination in a single request in an atomic operation. An example of a request of type 410 may be:
As a result in connection the foregoing example request, the logical MP deletes the termination from context1 and adds it to context2. Streams fields in this example request form may be optional and used to modify streams descriptors if needed. By default, filters are removed. Therefore if the streams field is not included in the request, the new termination is connected by default to all other terminations in context2 based on any existing default rules. Upon completion the logical MP sends back a response that includes termination status before the termination has been removed. As mentioned above, this command may be characterized as a compound command for performing a delete and add operation. In one embodiment, the statistics returned may be similar to those returned in connection with a delete termination as described elsewhere herein. Below is an example of a response message of type 410:
A delete termination request 412 sent from the MC to a logical MP deletes a termination with all its streams. In normal operation processing, a context may be deleted implicitly when the last stream in the termination is deleted. The MC may use this request type when it needs to abnormally abort a termination. Such a circumstance may occur, for example, when a user leaves a conference or is otherwise ejected from a conference. An example of a request of type 412 may be:
As a result in connection with foregoing example request, the logical MP deletes termination1 from context1 in mp1.1, including all the streams of termination1, and sends back a response that includes information such as, for example, various statistics. Examples of such statistics may include statistics about a particular user such as start time, end time, bandwidth, errors, and the like. Such statistical information may be used, for example, to evaluate the connection for a particular user in a conference in connection with quality of service determination. If the termination is the last termination in the context then the context is deleted as well and a context event is fired to the MC that includes context statistics. An example of a response message of type 412 sent from the logical MP to the MC may be:
An add stream request 414 adds a stream to an existing termination and/or context. As described below, this request may also result in creation of a new context and/or termination. If the termination key is set to ‘choose’, (e.g., by setting the value to ‘*’), then the logical MP creates a new termination and returns its value to the MC in the add stream response. Similarly, a new context may be created in connection with the add stream request and a pointer or identifier for the newly created context returned in the corresponding response. An add stream request 414 may include a remote descriptor (e.g., egress stream to remote endpoint), a local descriptor (e.g., ingress stream from endpoint) without transport address parameters, may also include filter descriptors. The transport address of local descriptor is generated by the logical MP and returned to the MC via the add stream response.
An example of a request of type 414 may be:
In the foregoing, note that the attribute ‘Display Text’ may be used to define what text, (using bitmap), may be displayed inside a video window of a display, such as user's name. As a result of the foregoing example request, the logical MP constructs a new context and termination and adds the stream to the termination. The logical MP assigns identifiers to the new context and termination and accordingly returns the values in the response. An example of a response of type 414 may be:
Each context has a global unique identifier within a logical MP, which may be assigned by the logical MP in connection with the first add stream request with Context ID (e.g., associated with contextEntity in the previous example) set to ‘*’, (e.g., which means choose), and received by the MC via an add stream response. The MC may add more streams to the same context by setting a specific Context ID in an add stream request.
A modify stream request 416 may be used to modify stream attributes. The request and response format may be as described in connection with add stream requests and responses with the modification that stream-keys and local descriptor are specified in the request by the MC in order to specify the modifications to the stream.
It should be noted that each stream has a unique stream identifier within a logical MP. By default all streams within a context that share the same stream ID interact with each other, for example mixed or switched. The default behavior can be changed by setting filter descriptors (for details see filter descriptors below). The default behavior may be modified in accordance with the particular media such as, for example, mix stream with all other streams associated with the same source/destination, or switch based on active speaker. The ingress and egress filter descriptors may be used to indicate such changes.
A delete stream request 418 deletes a stream from a termination. If the stream is the last stream in the termination, then the termination may be implicitly deleted as well. If the termination is the last termination in the context then the context may be implicitly deleted as well. An example of a request of type 418 may be:
As a result of the foregoing example request, the logical MP “mp1.1” deletes stream “voice-type-1” from termination1/context1/mp1.1 and sends back to the MC a response that includes information about the deleted stream. Such information may include statistics. In an embodiment, the information may include statistics about a specific stream such as audio or video. Such statistics may include, for example, bandwidth, error type and number of errors, and the like. If the stream is the last stream in the termination, then the termination “termination1” is deleted as well. In addition if “termination1” is the last termination in the context “context1”, then the context “context1” is deleted as well. An example of a response of type 418 may be:
A signal stream request 420 sends a signal to a selected list of streams in a context. The particular defined signals in an embodiment may vary. For example, in one embodiment, the types of defined signals are announcements, and sequence of DTMF (Dual Tone Multi Frequency). A sequence of DTMF may represent, for example a PIN number dialed from a keypad. The foregoing is an example of a request of type 420:
As a result of the foregoing example request, the logical MP sends an announcement to all the streams in the context, regardless of the media state (e.g., even streams having states of inactive and send have the announcement sent). The announcement may be mixed with any egress media if in the process of being transmitted. The logical MP sends a response to the MC without waiting for the announcement to be played. An example of a response of type 420 may be:
In this example, of the modify-when-done field has a value set to true in the request, then the logical MP also sends a stream event indicating the announcement is done after the announcement is played. An announcement may be triggered, for example, in response to the MC receiving an external message from a conference participant such as conference leader which is to be communicated to all participants.
In connection with events occurring in the MP side, each of the logical MPs may report events asynchronously to the MC. The particular events that may be reported to the MC may vary with embodiment. In one embodiment, an MP event notification message may be sent to the MC when a logical MP is out of service or almost out of service. An “out of service” state may occur, for example, due to an inability to add contexts, terminations and/or streams because of lack of additional resource utilization. Upon receiving an indication of such an event, the MC may perform processing to reject any subsequently received commands requiring such additional resources, or otherwise use a different logical MP if available. A context event notification message may be sent to the MC upon the occurrence of a context event. One example of a context event is when the currently active speaker in a context changes. In response to receiving such a notification, the MC may send a notification to conference participants, for example, using a conference control protocol as known in the art.
A termination event notification message may be sent to the MC upon the occurrence of a defined termination event. As an example, an endpoint may be associated with a phone and a conference participant may press a phone button which is reported to the MC using the termination event notification message.
A stream event notification message may be sent to the MC upon the occurrence of a stream event. An example of a stream event which may be reported to the MC may be an announcement done event. As described above in connection with a signal stream, an announcement may be sent to all streams in a context. Once the announcement has been played, a stream event notification message may be sent to the MC.
Using the foregoing protocol, different structures and descriptors may be implicitly constructed and/or destructed although an embodiment may also include explicit construction and/or destruction operations as well. In one embodiment using the foregoing protocol, a context may be constructed implicitly when the first stream is added to the context, for example, using the add stream request. The context may be destructed implicitly when the last stream in the context is deleted, for example, using delete stream request. The MC can destruct explicitly a context at any time, for example, using the delete context request, which automatically destructs all the objects within the context. A termination may be constructed implicitly when the first stream is added to the termination, such as using the add stream request. The termination may be destructed implicitly when the last stream is deleted from the termination, for example, using the delete stream request. The MC can destruct explicitly a termination at any time, for example, using the delete termination request, which automatically destructs all the objects within the termination. In addition the MC can move a termination to another context, for example, using the move termination request which may be characterized as a compound request that alternatively can be done in two steps by using delete and add termination requests. A stream may be constructed explicitly, for example, using the add stream request and may be destructed explicitly, for example, using the delete stream request. All streams in a termination may be destructed implicitly when the termination or context to which they belong is destructed.
Referring back to
Using the techniques described herein, a server computer may use a second MC for failover purposes in the event a primary MC experiences a failure. For example, a first or primary MC may be on a first system included in the server 12. A second or failover MC may be on a second system included in the server 12. The MP 104 may be on a third system of the server 12. In the event that the primary MC fails, the second MC may handle servicing of requests rather than the primary MC. The particular state information about the logical MPs may be communicated to the second MC, for example, using the snapshot MP request. The second MC may request information about the logical MPs servicing the primary MC. In one embodiment, when the second MC takes over, the second MC uses the logical MPs that were serviced by the primary MC. Information regarding the particular logical MPs servicing the primary MC may be stored in a location available to the second MC in the event that the primary MC experiences a failure. The second MC may then use the snapshot request or other techniques known in the art as may be included in an embodiment to obtain information about the logical MPs in order to assume the role of the failed primary MC.
The techniques described herein may be used with a variety of different services. Examples used herein may include conferencing and a server providing services as a communication gateway, for example, in which the MC issues commands to a logical MP to convert one or more input streams from one client into a form usable by a second different client. Following is an example of an XML schema that may be used in connection with the example message formats described herein.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.