The system and method relates to adjusting audio signal volume levels and in particular to adjusting audio signal volume levels based on whom is speaking.
During various audio communications, different speakers talk at different volume levels. For example, during one call the speaker may talk softly, causing the listener to turn up the volume. Conversely, on a second call, a different speaker may talk loudly, causing the listener to turn down the volume. This problem can also exist in conference calls where participants in the conference call speak at different levels. Moreover, different speakers speak in different frequency ranges while the listener may hear at a different frequency range. The result is that one speaker may sound louder or softer depending on whom is listening. These problems may require the listener to make periodic adjustments in the volume level based on whom is speaking. These problems can be exacerbated based on the device or quality of the communication channel of the call.
There are some systems that attempt to address the aforementioned issue. There are, for example, systems that adjust the volume level of participants in a conference call prior to mixing the signals of the conference call. In such systems, however, the volume of all speakers in the conference call is adjusted uniformly, without consideration of the individual participant's preferences or hearing abilities. That is, a listener has no control over the relative characteristics of the inputs into the mixed audio signal, only over the volume of the mixed signal itself.
In U.S. Patent Publication No. 2005/0250553, there is described a system in which speaker volume for push-to-talk calls can be adjusted depending on how the user is holding a phone or whether the user is listening on an earpiece. A disadvantage associated with this system is that the volume cannot be adjusted based on who is speaking and/or calling. Again, the listener must adjust the volume up or down based on whom is speaking on the call.
The system and method are directed to solving these and other problems and disadvantages of the prior art. A speech characteristic such as a volume level of a call participant is derived; the derived speech characteristic is associated with an identifier such as a caller ID number. The speech characteristic and identifier are stored in a call participant profile. An adjustment of volume level of an audio signal of the call participant is made based on the measured speech characteristic and the identifier in the call participant profile.
In a second embodiment, the system and method can be further adapted to identify a speech characteristic of a participant(s) in a conference call. A determination is made when the participant of the conference call is speaking during the conference call. An adjustment is made to a mixed audio signal of the conference call based on the speech characteristic of the participant in the conference call.
These and other features and advantages of the system and method will become more apparent from considering the following description of an illustrative embodiment of the system and method together with the drawing, in which:
An audio communication device 102 further comprises a call participant profile 120, a user profile 140, an audio interface 122, an audio adjustment module 124, and an audio analyzer 126. The call participant profile 120 and the user profile 140 each reside in a memory 128. The call participant profile 120 (see
A call is established between a call participant at communication terminal 101 and the audio communication device 102. The call can be any type of call that involves an audio signal such as an analog audio communication, a digital audio communication, a video communication with audio, an audio stream, a video stream with audio, and the like. The call could be live or a recording (e.g., an audio/video stream opened up from a web page). The call can be established from communication terminal 101, the audio communication device 102, a network device, a Private Branch Exchange (PBX), a bridge, a central office switch, a router adapted to establish the call, an auto-dialer in a contact center, and the like.
In the example in
The audio adjustment module 124 gets an identifier of the call participant of communication terminal 101A during the call. The identifier could be a caller ID number, a speech pattern of the call participant of communication terminal 101A determined from voice recognition, and the like. The identifier can be any type of communication address such as a telephone number, a Universal Resource Locator (URL), a speech pattern, an avatar, or any unique identifier/number/image to identify the call participant. For example, the audio adjustment module 124 can get a speech pattern from the audio analyzer 126, which created the speech pattern using voice recognition of the call participant from communication terminal 101A. The audio adjustment module 124 can get the identifier using known techniques such as caller ID, and the like.
The audio analyzer 126 derives information of a speech characteristic(s) of the call participant at communication terminal 101A. The derived speech characteristic(s) can be a volume level of the call participant, an offset volume level of the call participant, a volume level of the call participant at a frequency range(s), and the like. The audio analyzer can derive a speech characteristic based on a user changing a volume level on audio communication device 102, user input, and the like. The speech characteristic(s) can be determined during the call, in a prior call with communication terminal 101A, by processes unrelated to a call, and the like. The audio analyzer 126 can measure the audio signal from the call participant at communication terminal 101A to determine an offset to adjust the audio signal. The offset can be a relative or a fixed value. The offset can be relative to a predefined value, an average value, and the like.
The audio adjustment module 124 stores in the memory 128 the derived speech characteristic(s) and the identifier of the call participant of communication terminal 101A in the call participant profile 120. The association of the speech characteristic and the identifier can be accomplished at the time of the call or any time prior to the call.
When the call is established between communication terminal 101A and audio communication device 102, an audio signal from the call participant of communication terminal 101A is received by audio communication device 102. The audio adjustment module 124 initiates an adjustment to a volume level of the received audio signal based on the derived speech characteristic in the user's call participant profile 120, and optionally also on the identity of the user of audio communication device 102. The adjusted audio signal is then used by the audio interface 122 to play the received audio signal. The audio interface 122 can comprise a variety of devices, such as a handset, a headset, a speaker, a transceiver, and a Bluetooth interface.
The adjustment to the volume level of the audio signal can be determined in a variety of ways, such as determining whether or not a speaker's volume exceeds or is below a threshold value for a predetermined duration based on Root Means Square (RMS), and/or peak-to-peak volume measurements based on one or more frequency ranges, and/or in other known ways of determining a signal strength/volume or spectral content. The audio adjustment module 124 can adjust the volume based on samples of the audio signal during a portion of the call, during all of the call, during multiple calls, and the like. The audio adjustment module 124 can adjust the volume based on parameters defined in the user profile 140 (see
The audio adjustment module 124 can adjust the audio signal volume level based on a derived speech characteristic taken during a previous communication with the call participant at communication terminal 101A. The audio adjustment module 124 can adjust the audio signal volume level by receiving an indication of the audio signal volume level from communication terminal 101A or a device in the network 110. The information on how to adjust the audio signal volume level could be part of the information in a Virtual Business Card (Vcard) that is sent during the call and/or any combination of the above.
The audio adjustment module 124 can adjust the audio signal volume level by comparing the audio signal volume level and the user's volume level 347 (See
The above process can be repeated by deriving a second measurement of the speech characteristic during a second call from a second call participant using a second communication terminal 101. The process gets a second identifier (e.g., a telephone number from the second communication terminal 101). The second speech characteristic and the second identifier are associated with each other and are stored in a second call participant profile 120 (see
The above process can also be repeated for a call from a second call participant on a second communication terminal 101. This would result in the generation of a second profile for the second call participant.
In this illustrative example, the call participant profile 120, the user profile 140, the audio analyzer 126, and the audio adjustment module 124 are shown as being distributed between the network device/bridge 220 and the audio communication device 202. However, the call participant profile 120, the user profile 140, the audio analyzer 126, and the audio adjustment module 124 can all be in the network device/bridge 220, the audio communication device 202, and/or any combination of the network device/bridge 220 and the audio communication device 202.
A conference call (e.g., a video or audio conference call) is established between communication terminal 101C, communication terminal 101D, and the audio communication device 202. The conference call is established through mixer 222 (e.g., a mixer 222 in an audio bridge or video bridge 220). As the conference call is established, the mixer 222 determines the communication device's (101C and 101D) identification numbers using, for example, caller ID.
When the conference call is established, the audio signals from each of the call participants of communication devices 101C and 101D are mixed by the mixer 222. The audio analyzer 126 determines when a call participant (calling from communication terminal 101C and/or 101D) is speaking. The audio analyzer 126 determines when the call participant is speaking based on voice recognition, from an identifier, and/or the like. The audio analyzer 126 derives a speech characteristic of a participant (e.g., how loudly/softly the call participant is speaking) in the conference call while the call participant is speaking during the conference call in the mixed audio stream. The audio adjustment module 124 initiates an adjustment to the mixed audio signal based on the speech characteristic and when the call participant is speaking.
Consider the following example to illustrate how this works. A conference call is established between communication terminals 101C, 101D, and audio communication device 202. The audio signals from communication terminals 101C and 101D are mixed by the mixer 222. The call participant using communication terminal 101C speaks. The audio analyzer 126 determines from the mixed audio signal when the call participant using communication terminal 101C is speaking using voice recognition software/hardware. The audio analyzer 126 also measures how loudly or softly (speech characteristic) the call participant using communication terminal 101C is speaking to produce a relative offset (e.g., relative to the volume level of the communication device 202). The communication terminal's 101C identification number (identifier), the offset, and a sample of a speech pattern (identifier) of the call participant using communication terminal 101C are stored and associated in the call participant profile 120 for use on additional conference calls and/or the current conference call.
The audio adjustment module 124 initiates the adjustment of the mixed audio signal using the offset (which is sent from the network device/bridge 220) when the call participant using communication terminal 101C is speaking. This could be done by sending a marker in the mixed audio stream indicating the offset and when to adjust the mixed audio signal using the offset. The offset could be used in conjunction with a user defined offset and/or an offset for a particular audio interface 122 such as a speaker phone or Bluetooth device. In another exemplary embodiment, the audio adjustment module 124 could be in the network device/bridge 220 and adjust the mixed audio signal before sending the mixed audio signal to the audio communication device 202. In yet another exemplary embodiment, the call participant profile 120, the user profile 140, the network analyzer 126 and the audio adjustment module 124 can all be an audio communication device 202.
Another example is a call is made from a communication terminal 101 to a communication device 102; the communication terminal 101 is a device capable of conferencing multiple call participants. The audio adjustment module 124 can initiate an adjustment of the audio signal from the conferenced participants using voice recognition of individual call participants. The audio adjustment module 124 can then adjust the conferenced audio signal up or down based on who is speaking on the conferenced audio signal.
The name, or other identifier, of the call participant 331 and the identifier 332 can be passed to the audio communication device 102/202 at any time during and/or prior to the communication (e.g., using known caller ID parameters sent during ringing). The type 333 can be user-defined or sent to the audio communication device 102/202 during the communication and/or prior to the communication. The communication terminal level offset 334 is a relative volume level (e.g., decibels). The offset 334 can be determined by comparing the audio signal volume level to a user's volume level 347. In this example, the offset 334 is a delta between the call participant's audio signal volume level and the user of the user's volume level 347 (e.g., a current volume level, average volume level or defined volume level). In
As an example, assume that USER A is in his/her office and places a telephone call to the owner of the user profile 140 at his/her home phone. From measurements of audio signals gathered during one or more previous calls placed by USER A from the same telephone number 332 to the home of the owner of the user profile 140, it has been determined that USER A is relatively soft-spoken and an offset of +3 is determined to compensate for USER A's low speech volume. The next time USER A calls from work, the system increases the volume using the offset of +3 in relation to the user's volume level 347. In addition, the user profile 140 has defined an offset 346 of 0 for calls to home, which in this case does not change the volume level. The offset 346 for the home audio communication device 342 can be user defined, defined using a default value, and the like.
In another example, USER B has an exceptionally deep and/or loud voice. The system has determined, based on prior measurements of an audio signal(s) from USER B's communication terminals 101, an offset range of from −5 to −6. If a call is placed from USER B's home telephone to the cell phone 343 of the owner of the user profile 140 using the Bluetooth audio interface 122, the system will decrease the volume level of the call by a −8 offset (−6 USER B's home phone offset and −2 for cell phone 343 using Bluetooth offset) in relation to the user's volume level 347.
In a third example, USER C uses his cell to place a call to the owner of the user profile 140. Since USER C has an East coast accent, the user profile 140 has assigned a +2 offset to make sure he can understand what USER C is saying. In addition, the user profile 140 has defined a +2 in the 1 Kilohertz to 12 Kilohertz frequency range because he is hard of hearing. When a call from USER C is answered by the owner of the user profile 140 using his/her speakerphone at work, the offset used for the call is +1 (USER C's cell), +2 (the profile user defined offset 335 for USER C), 0 (the profile user's work phone speaker offset), and +2 for the 1 KHz to 12 KHz frequency range. The total would be +5 for 1 KHz to 12 KHz range and +3 for frequency ranges outside 1 KHz to 12 KHz for the call with USER C. The offsets are added in relation to the user volume level 347.
The process begins when a call is established 400 between a call participant at the communication terminal 101 and a call participant at the audio communication device 102 with the call participant profile 120 and the user profile 140. The call can be initiated by or to the call participant having the user profile 140. The audio analyzer 126 derives 402 information from a speech characteristic (e.g., measuring a volume level of the call participant) of the call participant at the communication terminal 101. The audio adjustment module 124 gets or assigns 404 the identifier during the call. The identifier can be a call participant speech pattern used/created by the audio analyzer 126 to identify the call participant; the call participant identifier can be a caller ID number, a telephone number, and the like.
The audio adjustment module 124, stores 406 and associates information derived from the measurement of the speech characteristic and the identifier of the call participant in the call participant profile 120. The audio adjustment module 124 initiates 408 an adjustment to a volume level of an audio signal received during the call from the call participant. The adjustment can be based on a determined offset that is the difference between the volume level of the audio signal and a user's volume level 347.
One variation that comes to mind is another offset that deals with environmental noise. For example, if an individual, “Chris,” is traveling in an airport and wants to select another offset (positive) to deal with the fact that the ambient noise is high, he can manually select it. Alternatively, if his device has the ability to measure or cancel the ambient noise, he can utilize these device features in association with the profiles. Another variation that comes to mind is the ability to have the system detect where a user changes phones during a communication session and the system automatically detects the change in routing and beneficially selects the appropriate profile for the new device. Yet another variation would be the ability to apply this idea to Avatars where the sender has defined a voice, level, etc., for the Avatar and the user wishes to adjust them. Still another variation would be the video equivalent of this idea where the luminance and chrominance of the video signal can be preferentially adjusted to deal with differences in cameras or displays.
The phrases “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
The term “a” or “an” entity refers to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein. It is also to be noted that the terms “comprising”, “including”, and “having” can be used interchangeably.
Of course, various changes and modifications to the illustrative embodiment described above will be apparent to those skilled in the art. These changes and modifications can be made without departing from the spirit and the scope of the system and method and without diminishing its attendant advantages. The above description and associated Figures teach the best mode of the invention. The following claims specify the scope of the invention. Note that some aspects of the best mode may not fall within the scope of the invention as specified by the claims. Those skilled in the art will appreciate that the features described above can be combined in various ways to form multiple variations of the invention. As a result, the invention is not limited to the specific embodiments described above, but only by the following claims and their equivalents.