Benefit is claimed under 35 U.S.C 119(a) to Indian Provisional Application Ser. No. 2887/CHE/2010 entitled “Technique for providing in-built n-way audio/video bridge on endpoints capable of IP video communication” by Ittiam Systems (P) Ltd filed on Sep. 29, 2010.
Embodiments of the present invention relate to the field of audio/video bridge. More particularly, embodiments of the present invention relate to providing an audio/video bridge on endpoints that are capable of Internet protocol (IP) video communication.
Video conferencing is a powerful tool for communication and collaboration and helps improve productivity and reduce costs for global companies. Further, video conferencing facilitates audio/video communication between geographically distributed teams in organizations.
With the rapid growth of packet-based Internet protocol (IP) infrastructure, IP-based video conferencing between multiple (typically 3 or more) participating locations is gaining prominence. Deployment of IP-based video conferencing provides numerous advantages, such as lower cost, easier access, rich media integration, network convergence, web-collaboration capabilities and the like.
Existing video conferencing systems include one or more IP video communication terminals (VCTs) and one or more voice over IP communication terminals (VoCTs) and a dedicated bridge or multipoint control unit (MCU) external to the VCTs and the VoCTs. The VCTs and the VoCTs are generally referred to as endpoints. Exemplary VCTs are any terminals capable of video communication over IP including desktop video phones, mobile or cell phones, video conferencing units and the like. Typically, participants at different locations use the VCTs to call into a common number or address that is assigned to the dedicated bridge in order to hear and view each other. Participants can also use the VoCTs to call into the dedicated bridge to participate in the conferencing, however they will only be able to hear each other's voice.
The existing dedicated bridge is a high performance, specialized, and typically centralized equipment that resides in an enterprise that is subscribed by a service provider located external to the endpoints. The dedicated bridge may receive audio/video streams from the participating VCTs and the VoCTs, process the received audio/video streams, combine them in one or more ways and send them back to the VCTs and the VoCTs.
Generally, the dedicated bridge receives audio/video streams from the participating endpoints. Further the dedicated bridge can encode and decode the video stream into a single video format or multiple video formats. Furthermore, the dedicated bridge mixes the incoming audio streams into as many audio streams as the number of endpoints. In addition, the composed audio/video streams are transmitted back to the endpoints.
In such a setup, the video conference calls between multiple VCTs and VoCTs are dependent on the availability of the external dedicated bridge. This may limit the ability to conference with multiple people as and when required. Also, the number of participating locations in the video conference call may depend on the audio/video processing capacity of the external dedicated bridge.
Embodiments of the present invention are illustrated by way of an example and not limited to the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Other features of the present embodiments will be apparent from the accompanying drawings and from the detailed description that follows.
A system and method for providing in-built audio/video bridge on endpoints capable of video communication over Internet protocol (IP) is disclosed. In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
The term “endpoints” refers to video communication terminals (VCTs) and voice over IP communication terminals (VoCTs). Exemplary VCTs include terminals capable of video communication over IP including desktop video phones, mobile or cell phones, video conferencing units and the like. The VoCTs include terminals capable of audio communication over IP. The term “bridge” refers to conferencing more than two endpoints capable of communication over IP.
The terms “signal” and “stream” are used interchangeably throughout the document. Also, the terms “endpoints” and “participants” are used interchangeably throughout the document.
The present invention provides an in-built audio/video bridge on endpoints that are capable of video communication over IP.
In the example embodiment shown in
In this embodiment, the AVBM 150 includes an audio receive module (ARM) 315, an audio decode module (ADM) 320, an audio processing and mixing module (APMM) 340, an audio encode module (AEM) 350 and an audio send module (ASM) 355 to receive, decode, process, encode and send the audio streams. Further in this embodiment, the AVBM 150 includes a video receive module (VRM) 325, a video decode module (VDM) 330, a video processing and composing module (VPCM) 345, a video encode module (VEM) 360 and a video send module (VSM) 365 to receive, decode, process, encode and send the video streams. Furthermore in this embodiment, the AVBM 150 includes an audio/video synchronizing module (AVSM) 335 to synchronize the audio and the video streams. Also, the AVBM 150 includes an audio/video transmission control module (AVTCM) 370 to control parameters of the audio/video streams, such as the resolution, bit rate, frame rate, and the like from each of the participants connected to the MVCT 110. This enables bridging of more than otherwise possible participants by reducing the processing power needed by the MVCT 110 to bridge the participants or by reducing the effective bit rate required at the MVCT 110.
In one embodiment, the ARM 315 enables the MVCT 110 to receive multiple audio streams in different formats, from the one or more VCTs 120A-M and the one or more VoCTs 130A-N, and, if required, de-jitters each audio stream independently. Further, the ADM 320 enables decoding fully or partially each of the de-jittered audio stream. The VRM 325 enables the MVCT 110 to receive multiple video streams in different formats and resolutions, from the one or more VCTs 120A-M, and, if required, de-jitters each video stream independently. Further, VDM 330 enables decoding fully or partially each of the de-jittered video stream.
Further in this embodiment, the AVSM 335 synchronizes each of the decoded audio/video streams of the participants connected to the MVCT 110 before local play out. Furthermore, the AVSM 335 synchronizes the audio/video streams before encoding and streaming out for each of the one or more VCTs 120A-M and the one or more VoCTs 130A-N connected to the MVCT 110. Also, the AVSM 335 works across all the other sub-components of the AVBM 150 to track and re-timestamp the audio/video streams as required, in order to achieve the audio/video synchronization of the transmitted streams.
Furthermore in this embodiment, the APMM 340 enables post processing of the audio stream coming from each connected VCTs 120A-M and/or VoCTs 130A-N before playback and/or re-encoding. Exemplary post-processing includes mixing the incoming audio streams based on a weighted averaging for adjusting the loudness of the audio stream coming from each connected one or more VCTs 120A-M or one or more VoCTs 130A-N. Moreover, the APMM 340 produces separate audio stream specific to each connected one or more VCTs 120A-M and one or more VoCTs 130A-N by removing an audio stream originating from that VCT or VoCT and mixing the audio streams coming from one or more other connected one or more VCTs 120A-M and/or the one or more VoCTs 130A-N.
In addition in this embodiment, the VPCM 345 enables processing the decoded video streams received from the VDM 330. The processing of the decoded video streams includes processes, such as resizing the video streams and composing the video streams. Exemplary composing of the video streams includes tiling the video streams. Furthermore in this embodiment, the AEM 350 enables encoding each of the audio streams coming from the APMM 340, separately, in a format required by each of the associated and connected one or more of VCTs 120A-M and the one or more of VoCTs 130A-N. In addition in this embodiment, the ASM 355 enables receiving each of the audio streams from the AEM 350 and sending the encoded audio streams to each of the associated one or more of VCTs 120A-M and the one or more of VoCTs 130A-N.
Moreover in this embodiment, the VEM 360 enables encoding each of the composed video streams coming from the VPCM 345 in a format and resolution supported by each of the associated and connected one or more VCTs 120A-M. Further in this embodiment, the VSM 365 enables receiving each of the encoded video streams from the VEM 360 and sending them to associated one or more VCTs 120A-M.
In this embodiment, the AVTCM 370 can control parameters such as, resolution, bit rate and frame rate of the audio/video streams coming from each of the endpoints connected to the MVCT 110. Further, the AVTCM 370 can request an endpoint to reduce the bit rate and/or resolution of transmission of the audio/video streams to reduce the bandwidth requirement and the processing power required at the MVCT 110 and thereby increases the number of participating endpoints at the MVCT 110 without compromising on the bridging experience. An exemplary case of requesting, receiving and decoding 4 low resolution images of Quarter Video Graphics Array (commonly known as QVGA) streams to compose one higher resolution Video Graphics Array (commonly known as VGA) stream at the MVCT 110 for display as well as re-encoding, as against decoding 4 VGA streams, resizing each to QVGA before composing the images to a VGA resolution to display/re-encoding to achieve the same effect, but with significant reduction of processing requirement at the MVCT 110.
In this embodiment, the number of participating endpoints at the MVCT 110 is limited to the audio/video processing capability of the MVCT 110. Further, in this embodiment, the MVCT 110 supports asymmetric audio/video streams received from the one or more of VCTs 120A-M and the one or more of VoCTs 130A-N. In one embodiment, the number of participating endpoints at the MVCT 110 that can be supported by the MVCT 110 can be increased beyond the audio/video processing capacity of the MVCT 110 to enable virtual n-way audio/video bridging in the MVCT 110. This is explained in more detail with reference to
In this embodiment, the CSM 410 enables automatic and/or manual selection of a participant or a list of participants based on preselected criteria to enable virtual n-way audio/video bridging capability in the MVCT 110. This automatic and/or manual selection of the participant or the list of participants facilitates in reducing the processor intensive audio/video processing during audio/video bridging to the processing capability of the MVCT 110 without limiting the number of participants calling into the MVCT 110.
In this embodiment, the automatic selection of the participant or the list of participants by the CSM 410 takes control inputs from the MVCT 110 based on selection parameters and selection criteria. Further in this embodiment, the CSM 410 monitors all the participating endpoints at the MVCT 110 and based on the selection parameters and the selection criteria, the CSM 410 selects one or more active participants.
Exemplary selection parameters or the selection criteria include specific participants to be decoded and displayed, the number of participants who are active (i.e., for example, the participant at the endpoint is speaking), participants who were active just before the currently active participant and so on. The CSM 410 can also select a participant as an active participant, if that participant has remained active for a predefined duration of time. Further, the audio/video streams from the selected active participants are decoded and displayed to the other endpoints at the MVCT 110.
In another embodiment, the manual selection of the participant or the list of participants by the CSM 410 includes selection through signaling via standard protocols, such as dual tone multiple frequency (DTMF) from the participants who want to be selected as an active participant or through manual selection at the MVCT 110. Further, the number of active participants that can be chosen using the manual selection is significantly higher than the number of active participants that can be chosen using the automatic selection. In an extreme use case scenario, only the audio/video from one of the participating endpoints can be selected at a time. Therefore, the number of participating endpoints is independent of the processing capability of the MVCT 110.
Further in this embodiment, the processing of the audio/video streams by the MVCT 110 is reduced by the CSM 410 by limiting the number of audio signals that are to be encoded and streamed to all or a subset of the participating endpoints based on a certain predefined criteria. Furthermore, the CSM 410 limits the number of audio signals from the participating endpoints to be mixed, encoded and sent to all other participants based on the trend of number of simultaneous speakers in the conference call.
In this embodiment, the EAVTCM 420 allows reduction/management of bandwidth needed for conferencing the participants connected via the one or more VCTs 120A-M and/or the one or more VoCTs 130A-N. In an example embodiment, an inactive participant is requested to switch off or scale down the video resolution and/or bit rate, thereby decreasing the overall bandwidth requirements. Further, this allows any other active participant to transmit video at a higher resolution and/or bit rate. In addition, the EAVTCM 420 can request an active participant to reduce the bit rate and/or resolution of transmission of the audio/video streams to reduce the bandwidth requirement and the processing power required at the MVCT 110 and thereby increases the number of active participants at the MVCT 110. Furthermore, the EAVTCM 420 allows the inactive participants to request or re-negotiate for video re-transmission when they are active. The EAVTCM 420 also enables synchronization frame request for faster video synchronization response. One skilled in the art can envision having the AVBM 150 and/or one or more of the associated blocks inside and/or outside the MVCT 110.
In block 530, each of the decoded streams of each participant connected to the MVCT is synchronized before local play out, in the AVSM included in the AVBM. Further, the AVSM synchronizes the audio/video streams before encoding and streaming out to each connected VCTs and/or VoCTs. In block 535, the audio stream coming from each connected VCTs or VoCTs is post processed before playback and/or re-encoded in the APMM included in the AVBM. Further, the APMM produces separate audio stream specific to each connected VCTs or VoCTs by removing an audio stream originating from that VCT or VoCT and mixing the audio streams coming from one or more other VCTs and/or VoCTs. In block 540, the decoded video stream received from the VDM, in block 525, is processed in the VPCM included in the AVBM. For example, processing the decoded video streams includes processes, such as resizing the video streams, and composing the video streams.
In block 545, each of the audio streams coming from the APMM, in block 535, is encoded in a format required by each of the associated and connected VCTs and VoCTs in the AEM included in the AVBM. Further, each of the encoded audio streams received from the AEM is sent to each of the associated VCTs and VoCTs. In block 550, each of the composed video streams coming from the VPCM, in block 540, is encoded in a format supported by each of the associated and connected VCTs in the VEM included in the AVBM. Further, each of the encoded video streams received from the VEM is sent to the respective VCTs. Furthermore, the AVBM using the AVTCM controls parameters such as, resolution, bit rate, frame rate and the like of the audio/video streams coming from each of the participants connected to the MVCT. This is explained in more detail with reference to
In this embodiment, the audio/video bridging of the asymmetric audio/video streams from the one or more of VCTs and the one or more of VoCTs is enabled by the AVBM in the MVCT. Further, the number of endpoints that can participate is limited to the processing capability of the MVCT.
In this embodiment, in block 655, the selection of the participant or the list of participants is enabled automatically based on preselected criteria or manually in the CSM included in the AVBM to enable virtual n-way audio/video bridging capability. In block 660, the reduction/management of bandwidth needed for conferencing the participants connected via the VCTs and/or the VoCTs is allowed in the EAVTCM included in the AVBM. This is explained in more detail with reference to
In this embodiment, the automatic/manual selection of the participants of the list of participants and the reduction/management of bandwidth enables n number of endpoints to participate in the virtual n-way audio video bridging irrespective of the processing capability of the MVCT.
In one embodiment, the MVCT is configured to send instructions to each of the connected one or more of VCTs to encode and stream a lower resolution video and/or lower bit rate video which are then composed to create and send higher resolution video stream to reduce processing power required by the audio/video bridge by avoiding decode of multiple higher resolution video streams from the one or more of VCTs and at the same time reduce required bandwidth. In addition to reducing the processing power and the bandwidth required by the in-built audio/video bridge, the MVCT can also alleviate the need for resizing the one or more incoming video streams to a smaller size video.
In various embodiments, the systems and methods described in
In addition, it will be appreciated that the various operations, processes, and methods disclosed herein may be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and may be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Number | Date | Country | Kind |
---|---|---|---|
2887/CHE/2010 | Sep 2010 | IN | national |