This invention relates generally to packet-based media communications and more specifically to media conferencing within a packet-based communication network.
Prior to the use of packet-based voice communications, telephone conferences were a service option available within standard non-packet-based telephone networks such as Pulse Code Modulation (PCM) telephone networks. As depicted in
One such algorithm used to control a conference session, referred to as a “party line” approach, comprises the steps of mixing the voice communications received from each telephone handset 22 within the conference session and further distributing the result to each of the telephone handsets 22 for broadcasting. A problem with this algorithm is the amount of noise that is combined during the mixing step, this noise comprising a background noise source corresponding to each of the telephone handsets 22 within the conference session.
An improved algorithm for controlling a conference session is disclosed within U.S. patent application Ser. No. 08/987,216 entitled “Method of Providing Conferencing in Telephony” by Dal Farra et al, filed on Dec. 9, 1997, assigned to the assignee of the present invention, and herein incorporated by reference. This algorithm comprises the steps of selecting primary and secondary talkers, mixing the voice communications from these two talkers and forwarding the result of the mixing to all the participants within the conference session except for the primary and secondary talkers; the primary and secondary talkers receiving the voice communications corresponding to the secondary and primary talkers respectively. The selection and mixing of only two talkers at any one time can reduce the background noise level within the conference session when compared to the “party line” approach described above.
In a standard PCM telephone network as is depicted in
Currently, packet-based voice communications are being utilized more frequently as Voice-over-Internet Protocol (VoIP) becomes increasingly popular. In these standard VoIP voice communications, voice data in PCM form is being encapsulated with a header and footer to form voice data packets; the header in these packets having, among other things, a Real Time Protocol (RTP) header that contains a time stamp corresponding to when the packet was generated. One area that requires considerable improvement is the use of packet-based voice communications to perform telephone conferencing capabilities.
As depicted within
The inputting block 32 comprises, for each participant within the voice conference, a protocol stack (P.S.) 38 coupled in series with a jitter buffer (J.B.) 40 and a decompression block (DECOMP.) 42, each of the decompression blocks 42 further being coupled to the talker selection and mixing block 34. The protocol stacks 38 in this design perform numerous functions including receiving packets comprising compressed voice signals, hereinafter referred to as voice data packets; stripping off the packet overhead required for transmitting the voice data packet through the IP network 28; and outputting the compressed voice signals contained within the packets to the respective jitter buffer 40. The jitter buffers 40 receive these compressed voice signals; ensure that the compressed voice signals are within the proper sequence (i.e. time ordering signals); buffer the compressed voice signals to ensure smooth playback; and ideally implement packet loss concealment. The output of each of the jitter buffers 40 is a series of compressed voice signals within the proper order that are then fed into the respective decompression block 42. The decompression blocks 42 receive these compressed voice signals, convert them into standard PCM format and output the resulting voice signals (that are in Pulse Code Modulation) to the talker selection and mixing block 34.
The talker selection and mixing block 34 preferably performs almost identical functionality to the central conference bridge 24 within
The outputting block 36 comprises three compression blocks 44 and a plurality of transmitters 46. The compression blocks 44 receive respective ones of the three outputs from the talker selection and mixing block 34, compress the received voice signals, and independently output the results to the appropriate transmitters 46. In this case, the mixed voice signals, after being compressed, are forwarded to all the transmitters 46 with the exception of the transmitters directed to the primary and secondary talkers. The transmitters directed to the primary and secondary talkers receive the appropriate unmixed voice signals. Each of the transmitters 46, after receiving a compressed voice signal, subsequently encapsulates this compressed voice signal within the packet-based format required for transmission on the IP network 28 and transmits a voice data packet comprising the compressed voice signal to the appropriate VoIP handset 26 within the conference session.
The well-known handsets 26, as depicted in
One key problem with the setup depicted within
Hence, a new design within a packet-based voice communication network is required to implement voice conferencing functionality. In this new design, a reduction in transcoding, latency, and/or required signal processing power within the central conference bridge is needed.
The present invention is directed to packet-based central conference bridges and other packet-based components, such as packet-based network interfaces and packet-based terminals, that could be used for media communications over a packet-based network, these media communications preferably being voice communications. The apparatus of the present invention can preferably allow for voice conferences as well as point-to-point communications to be established within the packet-based network with a reduction in transcoding, latency and/or signal processing requirement.
Some embodiments of the present invention decrease the latency within a voice conference by selecting the talkers prior to the decompression of the voice signals, hence making the decompression and subsequent compression operations in a conference bridge unnecessary in some circumstances. Further, the removal of the jitter buffers within the conference bridges and the moving of the mixing operation to the individual packet-based components are both included within embodiments of the present invention. These modifications preferably make for increased performance within the system by decreasing transcoding and latency within a conference session and result in decreased costs by reducing the required signal processing power for the system. Yet further, the modifications within the conference bridge allow for increased functionality such as an interlocking configuration of conference bridges and three way calling without the use of a conference bridge at all.
The present invention, according to a first broad aspect, is a conference bridge, including a receiver and a energy detection and talker selection unit. The receiver is capable of being coupled to a network and operates to receive at least one media data packet from at least two sources forming a media conference, each media data packet defining a compressed media signal. The energy detection and talker selection unit is coupled to the receiver and operates to determine at least one speech parameter corresponding to each of the compressed media signals and select a set of the sources within the media conference as talkers based on the determined speech parameters.
According to a second broad aspect, the present invention is a conference bridge that includes a receiver, an energy detection and talker selection unit and an output unit. The receiver is capable of being coupled to a network and operates to receive at least one media data packet from at least two sources forming a media conference, each media data packet defining a compressed media signal. The energy detection and talker selection unit is coupled to said receiver and operates to process the received compressed media signals including selecting a set of the sources within the media conference as talkers, one of the talkers being a lead talker. And, the output unit is coupled to the energy detection and talker selection unit and operates to output media data packets that correspond to compressed media signals received from the talkers. In this aspect, the media data packets corresponding to the lead talker are always output from the conference bridge in the same order as the media data packets which are received from the lead talker.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
The preferred embodiment of the present invention is described with reference to the following figures, in which:
The present invention is directed to a number of different methods and apparatuses that can be utilized within a packet-based voice communication system. Primarily, the embodiments of the present invention are directed to methods and apparatus used for voice conferences within packet-based communication networks, but this is not meant to limit the scope of the present invention.
One skilled in the art would understand that there are two essential sectors for the operations of a telephone session. These sectors include a control plane that performs administrative functions such as access approval and build-up/tear-down of telephone sessions and/or conference sessions and a media plane which performs the signal processing required on media (voice or video) streams such as format conversions and mixing operations. As described below, the present invention is applicable to modifications within the media plane which could be implemented with a variety of different control planes while remaining within the scope of the present invention.
One significant aspect of the present invention described herein below is directed to a packet-based central conference bridge coupled to a packet-based network for enabling voice conferences between numerous sources of media signals. These sources of media signals can be any terminal that a person can output media data for transmission to the conference bridge and can input media data from the conference bridge. In preferred embodiments, these sources of media signals are packet-based terminals coupled to a packet-based network, such is illustrated for the VoIP handsets 26 coupled to the IP network 20 within
In the following description, it should be understood that despite referring to the sources of media signals as packet-based terminals throughout this document, such references could alternatively be directed to another form of media signal source. Further, the following description of the preferred embodiments of the present invention is specific to voice data packets that contain compressed voice signals, though this should not limit the scope of the present invention as is described in further detail herein below.
As depicted in
The operations of the packet receipt block 50 and the energy detection and talker selection block 60, according to both the first and second preferred embodiments, will be described with reference to
The first step 80, as depicted in
Next, as seen at step 81, the packet receipt block 50 removes the packet overhead from the received voice data packet. This overhead may include the actual packet header and footer utilized, as well as any other transport protocol wrapper. The removal of the packet overhead results in only the compressed voice signal within the received packet being forwarded on for further processing. It is noted though that information contained within the packet overhead, such as the source address, is still preferably used by the control plane to identify the source terminal and the voice conference that this particular voice signal corresponds. Further, it is noted that a time stamp within an RTP header of the packet header is preferably extracted and used in later processing within the media plane as described below.
The compressed voice signal is subsequently processed by the energy detection and talker selection block 60 as depicted at steps 82 through 90. Firstly within this processing, the block 60 determines if the compressed voice signal contains speech at step 82 by performing an energy detection operation. A compressed voice signal containing speech indicates that the source of the corresponding voice data packet has a speaking participant local.
This energy detection operation can be performed in a number of different manners. In one preferred embodiment, a Voice Activity Detection (VAD) operation is enabled at the packet-based terminal that sent the voice data packet; the VAD operation alternatively being enabled at the packet-based network interface if the source of media signals is a non-packet-based telephone terminal. In this preferred embodiment, packets (and therefore compressed voice signals) that can contain speech can be distinguished from packets that do not by the number of bytes contained within the packet. In other words, the size of the compressed voice signal can determine whether it contains speech. For example, in the case that the G.723.1 VoIP standard is utilized, voice data packets containing voice would contain a compressed voice signal of 24 bytes while voice data packets containing essentially silence would contain a compressed voice signal of 4 bytes.
In another preferred embodiment, in which a VAD operation is not enabled at the packet-based terminal (or packet-based network interface) sending the voice data packet, the block 60 determines if there is speech within the compressed voice signal by monitoring a pitch-related sector within the corresponding voice data packet. For example, within the G.723.1 VoIP standard, the pitch sector is an 18-bit field that contains pitch lag information for all subframes. In this particular embodiment, the block 60 uses the pitch sector to generate a pitch value for each subframe. If the pitch value is within a particular predetermined range, the corresponding compressed voice signal is said to contain speech. If not, the compressed voice signal is said to not contain speech. This predetermined range can be determined by experimentation or alternatively calculated mathematically. It is noted that many current VoIP standard codecs include pitch information as part of the transmitted packet and a similar comparison of pitch values with a predetermined range can be used with these standards. It is further noted that the energy determination operations which determine whether a particular compressed voice signal contains speech should not be limited to the above described embodiments.
If the compressed voice signal at step 82 is deemed to not contain speech, the particular signal is discarded at step 83. The frequency in which signals are discarded from a signal source based upon there lack of speech affects the de-selection of talkers for the voice conference as will be described herein below. If the compressed voice signal at step 82 does contain speech, the energy detection and talker selection block 60 proceeds to determine at step 84 whether the compressed voice signal is from a packet-based terminal (more generally a source of media data packets) selected to be a talker; voice signals from talkers being the only voice signals heard by the participants within the voice conference.
The selection and de-selection of terminals as talkers is performed by a talker selection algorithm within the block 60. Although it is the terminal that is referenced as the source for the voice data packets containing speech, for simplicity herein below, the description will refer to the talker selection algorithm determining which participants are speaking rather than referring to which terminals have participants that are speaking. It should be recognized that a reference to a participant speaking indicates that the voice data packet received from the terminal corresponding to the particular participant has been deemed to contain speech.
There are three main situations, according to preferred embodiments, which would result in different operations for the talker selection algorithm, these situations being no participants speaking, only one participant speaking, and two or more participants speaking at once. For the first case in which there is no participants speaking, the talker selection algorithm preferably has no terminals selected as talkers, thus preventing the sending of any voice data packets from the packet-based central conference bridge and further removing the need for any further processing to take place. Alternatively, the talker selection algorithm could transmit empty voice data packets to the terminals within the voice conference when there are no talkers selected in order to maintain continuous packet transmission.
When considering the second case in which only one participant is speaking, the talker selection algorithm preferably has only one terminal selected as a talker, that terminal being the one corresponding to the speaking participant. In this situation, the single talker is hereinafter referred to as a “lone talker”.
In the third case in which two or more participants at different terminals are speaking at the same time, the talker selection algorithm preferably has one terminal selected as a “primary talker” and a second terminal selected as a “secondary talker” for the voice conference. When considering this situation, the talker selection algorithm, according to preferred embodiments, selects the primary and secondary talkers using a predetermined selection parameter. In one preferred embodiment, this selection parameter is the order in which the participants began to speak. In another embodiment, the selection parameter takes into consideration the volume level of the participants (i.e. comparing the energy levels of the talkers). In yet another embodiment, a control mechanism is in place that automatically selects a participant to be the primary or secondary talker. This control mechanism could be utilized in cases that there is a moderator and/or a scheduled speaker for the voice conference.
The above described selection parameters are not meant to limit the scope of the present invention. In fact, the key to this portion of the preferable packet-based central conference bridge is the selection of talkers while the parameter used for this selection and the number of talkers selected is not directly relevant to the present invention.
Preferably, the talker selection algorithm comprises a software algorithm that is continuously operating during a voice conference with the determination of those speaking and the selection of no talkers, a lone talker, or primary and secondary talkers being dynamic during the receiving of voice data packets as will be described with reference to steps 84 through 90. As well, the talker selection algorithm preferably performs operations to de-select talkers continuously during the voice conference. These de-selection operations preferably including the steps of determining the length of time between voice data packets containing speech coming from the talker(s) and de-selecting any talker if the length of time between voice data packets containing speech exceeds a threshold level. Of course, other de-selection techniques could be utilized as the actual de-selection operation being used is not critical to the present invention.
Referring back to
If, at step 84, the compressed voice signal does not correspond to a talker selected by the talker selection algorithm, the talker selection algorithm proceeds to determine if there are currently two talkers selected at step 86. If there are two talkers already selected, the compressed voice signal is discarded at step 83. If there are not two talkers already selected at step 86, the talker selection algorithm determines if there is currently a lone talker selected at step 87. If there is not a lone talker already selected at step 87, the talker selection algorithm selects the participant corresponding to the particular compressed voice signal as the lone talker at step 88. If there is a lone talker currently selected at step 87, the talker selection algorithm proceeds to set the participant corresponding to the particular compressed voice signal as the secondary talker at step 89 and to set the lone talker as the primary talker at step 90. The output generator 70, as described below, then processes the compressed voice signal as if it was received from the particular talker it's corresponding participant is now set as.
The procedure that occurs within the output generator 70, according to the first preferred embodiment, if the compressed voice signal corresponds to one of a lone talker, a primary talker, and a secondary talker will now be described with reference to
If it is determined that the compressed voice signal corresponds to the primary talker, the output generator 70, as shown at step 102, encapsulates the voice signal, hereinafter referred to as the primary voice signal, within a packet format satisfactory for transmission on a packet-based network and further transmits the resulting voice data packet to the secondary talker via the packet-based network. Subsequently, at step 104, it is determined whether there is a secondary voice signal currently saved within the output generator 70 with a corresponding time stamp.
If there is no corresponding secondary voice signal currently saved, it is determined at step 106 whether a predetermined time T has expired at step 106. This predetermined time T is a waiting period in which the output generator 70 will not transmit the primary voice signal as the procedure returns to step 104. This compensates for minor delays caused in the network by providing the voice data packets arriving from the secondary talker a limited amount of leeway after the arrival of a voice data packet corresponding to the primary talker. Preferably, if no voice data packets arrive from the secondary talker after the time T expires, the voice data packets corresponding to the primary talker are not subsequently delayed by this delay mechanism. If the predetermined time T has expired at step 106, a voice signal is generated for the secondary talker at step 108 with the use of a packet loss concealment algorithm. This generated voice signal is an approximation of what the secondary talker is saying based upon previous secondary voice data packets that were received. One such packet loss concealment algorithm is disclosed within U.S. patent application Ser. No. 09/353,906 entitled “Apparatus and Method of Regenerating a Lost Audio Segment” by Gunduzhan, filed on Jul. 15, 1999, assigned to the assignee of the present invention and herein incorporated by reference.
After the generation of a secondary voice signal at step 108 or if there was a corresponding secondary voice signal currently saved at step 104, a number of operations, as depicted at step 110, are preferably performed by the output generator 70 according to the first preferred embodiment. These operations include decompressing the compressed primary voice signal (and secondary voice signal if previously not done), hence converting it into an uncompressed voice signal that is preferably a PCM signal; mixing the primary voice signal with the secondary voice signal using a well-known mixing algorithm as is currently used for combining two uncompressed voice signals such as PCM signals, the primary and secondary voice signals being combined into a single uncompressed voice signal (preferably a PCM signal); compressing the resulting mixed voice signal; encapsulating the compressed mixed voice signal within a packet format capable of transmission on a packet-based network, this packet format preferably including a new Real Time Protocol (RTP) header with a time stamp; and transmitting the resulting voice data packet containing the compressed mixed voice signal to all the participants within the voice conference with the exception of the primary and secondary talkers. The transmitting of the resulting voice data packet preferably includes a unicast transmission to each participant that is to receive the particular voice data packet, a unicast transmission being a single transmission that travels from point A to point B. In an alternative embodiment, a single multicast transmission is sent in place of the plurality of unicast transmissions, the multicast transmission including the mixed voice signal, the unmixed primary and secondary voice signals, and an indication of which terminals should broadcast which voice signals. In this alternative, steps 94 and 102 would be removed.
If the compressed voice signal was determined to correspond to a lone talker, the output generator 70 preferably, as depicted at step 112, encapsulates the compressed voice signal in a packet format suitable for transmission on a packet-based network and subsequently transmits the voice data packet to all the participants within the voice conference with the exception of the lone talker. Similar to the description above, this voice data packet would preferably be transmitted using one or more unicast transmissions.
One of the keys to the packet-based central conference bridge according to the first preferred embodiment as described herein above is that the voice data packets received from the primary talker drive the transmission of the voice data packets mixed with the primary and secondary voice signals. This, along with the operation of the jitter buffers within the packet-based terminals as seen in
The problem with out-of-order voice data packets from the lone or primary talkers being received at the conference bridge can be dealt with in a number of ways without the use of a jitter buffer. It is noted that out-of-order voice data packets from the secondary talker are already compensated for within the procedure of
As can be seen in
In the logical block diagram of
The talker selection block 64 preferably receives the determinations of which of the received voice signals contain speech and, in the case of two or more speakers, determine who is the primary and secondary talkers.
This results, within the output generator 70, in compressed voice signals from participant A being sent to the participant B transmitter 72 and one of the decompression blocks 74 while the compressed voice signals from participant B are sent to the participant A transmitter 71 and the other decompression block 74. The transmitters 71,72 subsequently encapsulate the received compressed voice signals into voice data packets, preferably including adding an RTP header with a timestamp, and transmit the packets to the appropriate participants. Assuming that the compressed voice signal corresponding to participant B arrives within the predetermined time T of the voice signal corresponding to participant A, the compressed voice signal of participants A and B are decompressed such that they are preferably in PCM format, mixed together, compressed, and subsequently encapsulated and transmitted to the other participants within the voice conference (those being participants C through Z), the encapsulation similarly including an RTP header with a timestamp in preferred embodiments. It is noted that the transmitters 71,72,79 together preferably comprise a single transmitting algorithm that is run for each of the participants in the voice conference.
Although the first preferred embodiment of the present invention is as described above with reference to
There are numerous advantages to the packet-based central conference bridge according to the first preferred embodiment over the well-known conference bridge depicted in
A further advantage of the first preferred embodiment results since the design depicted in
The packet-based central conference bridge according to the second preferred embodiment of the present invention will now be described with reference to
The packet-based central conference bridge according to the second preferred embodiment, as previously described, is consistent with the simplified block diagram of
The procedure that occurs within the output generator 70, according to the second preferred embodiment, if the compressed voice signal corresponds to one of a lone talker, a primary talker, and a secondary talker will now be described with reference to
In the case that a compressed secondary voice signal is received at the output generator 70, the generator 70 proceeds through steps 94 and 96 as previously described. If the secondary voice signal had not previously been regenerated for at step 96, the voice signal is temporarily saved within the output generator 70 at step 114. The difference between step 100 (first preferred embodiment) and step 114 (second preferred embodiment) is the lack of a decompression operation within step 114. Once saved, the conferencing control logic returns to step 80 of
In the case that a compressed primary voice signal is received at the output generator 70, the generator proceeds through steps 102 through 108 as previously described. If there was a secondary voice signal saved at step 104 or if a secondary voice signal was generated at step 108, the output generator proceeds through a number of operations as depicted at step 116. These operations include both the compressed primary and secondary voice signals being encapsulated within a packet format suitable for transmission on a packet-based network, this packet format preferably including an RTP header with a time stamp, and the resulting voice data packet(s) being transmitted to all the participants within the voice conference with the exception of the primary and secondary talkers. The encapsulation of the primary and secondary voice signals preferably entails placing the two signals within the same data section of a single packet with no mixing. The bandwidth efficiency of the voice communication system is increased using this technique when compared to an alternative in which the primary and secondary voice signals are transmitted in separate packet overheads. This increase in bandwidth efficiency is due to the large proportion of packet overhead bytes that are required within a typical packet format. Hence, only requiring a single packet overhead rather than two can significantly increase the bandwidth efficiency. Similar to the transmission in the first preferred embodiment, the transmission of these voice data packets is preferably a unicast transmission corresponding to each participant that is to receive the voice data packet or alternatively could be a single multicast transmission if the individual terminals can determine whether it should broadcast only one of the compressed voice signals (if the terminal is the primary or secondary talker) or both (if it is not the primary or secondary talker).
In the case that a compressed voice signal from a lone talker is received at the output generator 70 of the second preferred embodiment, the operation at step 112 is the same as previously described for the first preferred embodiment. In this case, the voice signal is encapsulated and transmitted to all the participants in the voice conference with the exception of the lone talker, this transmission being either one or more unicast transmissions or alternatively a single multicast transmission.
There are numerous alternatives to the packet-based central conference bridge according to the second preferred embodiment. For one, step 106 in which a primary voice signal is possibly delayed by a predetermined time T is removed in some embodiments, thus resulting in the immediate generation of a secondary voice signal in the case that there is no saved secondary voice signal during the arrival of a primary voice signal. Further, other alternative embodiments do not have the option of generating secondary voice signals or sending the primary and secondary signals within a single voice data packet. In these embodiments, upon the arrival of a primary voice signal, the output generator 70 simply encapsulates the signal and transmits the resulting voice data packet to all of the participants within the voice conference except the primary talker. The same operation is performed in the case that a secondary voice packet arrives at the output generator 70 except with the secondary talker being the only participant not to receive the corresponding voice data packet.
Yet further alternative embodiments have more than two participants selected as talkers, resulting in voice signals corresponding to more than two talkers being forwarded to the other participants within the voice conference. In one such alternative, a third talker is selected similar to that described for an alternative to the first preferred embodiment.
A packet-based terminal and a packet-based network interface that can operate with the packet-based central conference bridge of the second preferred embodiment are now described with reference to
The packet receipt block 120 preferably receives a voice data packet containing one or two voice signals (one voice signal if from a lone talker or two voice signals if from primary and secondary talkers) from the packet-based central conference bridge of the second preferred embodiment. The packet receipt block 120 performs a number of logical operations to the received packets as can be seen in
The output generator 130 preferably receives these set(s) of compressed voice signals and processes them so that an uncompressed set of voice signals are sent to a speaker (not shown) in the case of the packet-based apparatus being a packet-based terminal or to a non-packet-based telephone terminal (not shown) such as a PCM terminal, via a non-packet-based telephone network (not shown) such as a PCM telephone network, in the case of the packet-based apparatus being a packet-based network interface. As can be seen within
There are alternative embodiments to the packet-based terminal and packet-based network interface of
In another alternative embodiment, the packet-based apparatus of
There are numerous advantages of using the packet-based central conference bridge and packet-based apparatus according to the second preferred embodiment when within a voice conference. For one, similar advantages are found as stated above for the reduction in latency and required signal processing power with the removal of the jitter buffers within the conference bridge. As well, some of the other advantages of the first preferred embodiment also apply equally to the second preferred embodiment including the possible reduction in latency, transcoding and required signal processing power when selecting the talkers prior to decompressing the voice signals.
The second preferred embodiment is essentially the same as the first preferred embodiment except with the mixing of the primary and secondary voice signals being performed at the packet-based terminals and/or packet-based network interfaces rather than at the conference bridge. This change results in advantages and disadvantages for the voice communication system of the second preferred embodiment when compared to the system of the first preferred embodiment. One disadvantage with the moving of the mixing algorithm is that a plurality of packet-based terminals and packet-based network interfaces must perform the mixing rather than one central DSP within the conference bridge. Essentially, this will require an increase in the required signal processing power within all of the applicable packet-based terminals and packet-based network interfaces.
One advantage of the voice communication system of the second preferred embodiment over the voice communication system of the first preferred embodiment is the removal of any need to decompress and then subsequently compress again, that being transcoding as described previously. Decompression of the voice signals, as depicted in
The overall effect of the above described lack of decompression and compression operations and the removal of the mixing operation, results in the central conference bridge according to the second preferred embodiment requiring less computational resources and therefore increased capacity in terms of ports. Simplicity of the conference bridge makes it more amenable to general purpose microprocessor devices, reducing the need for highly specialized DSPs that add significant costs. Therefore, the central conference bridge according to the second preferred embodiment does not have to be a specially designed apparatus but could be implemented within any device containing a microcontroller capable of running software operations, such as a server, a call processor, a router, or an end user personal computer.
Some of the key advantages of the second preferred embodiment relate to the possibility of making the packet-based central conference bridge relatively simple by moving the mixing operation to the packet-based terminals and/or packet-based network interfaces. This reduction in complexity within the conference bridges can allow for increased flexibility and operations when it comes to the use of these apparatuses.
One such additional operation concerns interlocking a plurality of conference bridges as will now be described with reference to
As depicted in
As depicted in
There are a number of advantages to the interlocked conference bridge configuration of
Another key advantage that could occur with the use of interlocked conference bridges is a reduction in bandwidth requirements within the packet-based network when establishing voice conferences between participants in dispersed locations. In traditional conference bridges such as the one depicted in
It is noted that it would not be possible for previous conference bridge designs, such as that depicted in
Although the interlocked conference bridge configuration in
There are large numbers of yet further possible alternative embodiments to the interlocked configuration described herein above, many of which have yet further additional advantages. One such alternative has the conference bridges prevent the re-forwarding of identical packets back to the best (earliest arriving) source of the particular voice data packets. Hence, if a conference bridge has voice data packets arriving from another interlocked conference bridge which are subsequently selected as the earliest arriving packets corresponding to the primary or secondary talker, the particular packets are not forwarded back to the conference bridge source in this alternative. This alternative effectively reduces the amount of voice data packets being exchanged between the conference bridges, hence decreasing the load on the packet-based network.
Another alternative embodiment of the interlocked configuration depicted in
Yet further, other alternative embodiments to the interlocked conference bridge configuration of
Another additional operation that is possible with the use of conference bridges according to the second preferred embodiment is the defining of all packet-based voice communications as a conference session, whether there are two participants or hundreds. In this design, all voice data packets within a packet-based network traverse a conference bridge with each participant treated independently at the conference bridge. This allows each packet-based voice session, whether point-to-point or a conference situation, to have a control mechanism operated with the use of conference bridges. This can allow for additional functionality within the control plane of a typical telephone session such as allowing participants to join the telephone session without having to be initiated by a current participant, essentially giving the initiation control to a new participant. This is useful for people who-wish to make a quick comment to one of the participants or for people who wish to join the conference session while it is in progress. For instance, one participant in a conference session could suggest to another person to join the conference session when he/she gets a chance, the person in this case is able to join at his/her will without disturbing the other participants. Additionally, the flexibility of the second preferred embodiment allows for a voice conference to expand from a point-to-point voice communication to a larger conference session with ease, as every packet-based voice communication is easily scalable in this setup.
Yet another additional operation that is possible with the use of packet-based terminals or packet-based network interfaces of the second preferred embodiments is the ability to perform three way voice conferencing without the use of a central conference bridge. In the case of three participants within a voice conference, the central conference bridge of the second preferred embodiment can be seen to be performing an unnecessary function since the selection of talkers is not necessary in the case that the packet-based terminals and/or packet-based network interfaces can mix the voice signals from two sources, that being the maximum number of sources that the apparatus could possibly receive voice data packets from at one time if only three participants are in the voice conference.
Overall the present invention as described herein above has considerable advantages over the well-known voice conferencing techniques. These embodiments as described allow for the operations within the central conference bridge to have decreased latency, decreased computational requirements, and an increased signal quality due to a reduction in transcoding.
There are a number of features that can be added to any one of the above embodiments of the present invention that have not previously been discussed in detail. For one, a modified control plane is used such that a number of operations could be controlled with the transmission of control packets between participants and possibly a moderator. One such operation could have a moderator established as a permanent talker throughout the voice conference, possibly as a permanent secondary talker or possibly as a third selected talker. Another operation that could be controlled through use of a modified control plane is the manual selection of primary and/or secondary talkers. This may be useful in cases where a particular participant is scheduled to speak. Yet another possible operation that could be maintained with use of a modified control plane is a sidebar operation. In a sidebar operation, at least two of the participants within a voice conference can form a subset of participants smaller than the set that defines the entire voice conference. With this setup, one participant within the subset can choose to communicate with the entire voice conference or with only the members of the subset.
Another feature that could be added to any one of the embodiments of the present invention described herein above is the sending of video streams via video data packets within the packet-based network. In these embodiments, the video data packets would replace or supplement the voice data packets within the above described implementations. The operation of embodiments with this feature would operate the same as described herein above with these video signals preferably corresponding to the primary talker. Alternatively, a manual control within the control plane could be added so that each participant or a moderator could select which video stream to view. Further, a picture-in-picture feature could be used such that two or more video streams could be shown at once. In the case of there being primary and secondary talkers, the picture-in-picture operation could be equivalent to the mixing of the corresponding voice signals.
In general, although the operation of the present invention was described herein above with use of the terms voice data packets and voice signals, these packets and signals can be referred to broadly as media data packets and media signals respectively. In this case, media data packets are any data packets that are transmitted via the media plane, these media data packets preferably being either audio or audio/video data packets. It is noted that use of the term voice data packets above is specific to the preferred embodiments in which the audio signals are voice. Further, it should be understood that video data packets may incorporate audio data packets.
Although the present invention herein above described has a single voice conference being established with the use of a central conference bridge, it should be understood that the central conference bridge would preferably be capable of handling a plurality of voice conferences simultaneously.
Persons skilled in the art will appreciate that there are yet more alternative implementations and modifications possible for implementing the present invention, and that the above implementation is only an illustration of this embodiment of the invention. The scope of the invention, therefore, is only to be limited by the claims appended hereto.
Number | Name | Date | Kind |
---|---|---|---|
4360910 | Segal et al. | Nov 1982 | A |
4387457 | Munter | Jun 1983 | A |
5436896 | Anderson et al. | Jul 1995 | A |
5483588 | Eaton et al. | Jan 1996 | A |
5596635 | Rao | Jan 1997 | A |
5751338 | Ludwig, Jr. | May 1998 | A |
5768263 | Tischler et al. | Jun 1998 | A |
5844600 | Kerr | Dec 1998 | A |
5845243 | Smart et al. | Dec 1998 | A |
5848098 | Cheng et al. | Dec 1998 | A |
5983192 | Botzko et al. | Nov 1999 | A |
6078809 | Proctor | Jun 2000 | A |
6212547 | Ludwig et al. | Apr 2001 | B1 |
6466550 | Foster et al. | Oct 2002 | B1 |
6522633 | Strawczynski | Feb 2003 | B1 |
6577622 | Schuster et al. | Jun 2003 | B1 |
6584077 | Polomski | Jun 2003 | B1 |
6757259 | Hamilton | Jun 2004 | B1 |
Number | Date | Country |
---|---|---|
3206914 | Feb 1982 | DE |
0301957 | Jul 1988 | EP |