This disclosure relates generally to the field of communications and, more specifically, to audio bandwidth extension for conferencing.
For some conferences or meetings, all the attendees or participants may not be in the same location. For example, some of the participants may be in one conference room, while other participants may be in another conference room and/or at various separate remote locations. Participants may join the conference using communication equipment of varying capabilities. For example, some equipment may be capable of producing and/or capturing higher quality audio than other equipment. Participants may wish to seamlessly participate in a conference, regardless of the particular characteristics of the communication equipment used by each participant.
For a more complete understanding of the present disclosure, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
Overview
In one embodiment, a method includes extracting, by a processor, components from an audio signal to generate a modulating signal. The audio signal is generated by an endpoint operable to capture audio proximate the endpoint. The method also includes filtering, by the processor, the audio signal to generate a band-limited audio signal. The method also includes modulating, by the processor, the band-limited audio signal by the modulating signal to generate an enhancement signal. The method also includes combining, by the processor, the audio signal and the enhancement signal to generate an enhanced audio signal.
In another embodiment, a system includes a processor. The system also includes a non-transitory computer-readable storage medium embodying software. The software is operable when executed by the processor to receive an audio signal from a first endpoint. The first endpoint is operable to capture audio proximate the first endpoint. The software is further operable when executed to filter the audio signal to generate a band-limited audio signal. The software is further operable when executed to modulate, by a carrier signal at a selected frequency, the band-limited audio signal to generate an enhancement signal. The software is further operable when executed to combine the audio signal and the enhancement signal to generate an enhanced audio signal. The software is further operable when executed to transmit the enhanced audio signal to a second endpoint remote from the first endpoint.
Description
Some of the endpoints 112 may capture and/or produce higher quality audio than other endpoints 112. Users 116 joining the conference via a higher quality endpoint 112 may expect high quality audio, even if other users 116 are using lower quality endpoints 112. Conferencing system 100 may enhance audio received from lower quality endpoints 112 to improve the conferencing experience for users 116 using higher quality endpoints 112. For example, conferencing system 100 may use audio bandwidth extension methods to improve the perceived quality of a relatively narrowband audio signal received from a lower quality endpoint 112. In certain embodiments, conferencing system 100 may perform the enhancement using non-linear time-domain methods, allowing for relatively low computational complexity and real-time implementation.
A conference may represent any meeting, conversation, or discussion between users 116. For example, conferencing system 100 may allow each user 116 to hear what remote users 116 are saying. Conference locations 110 may be any location from which one or more users 116 participate in a conference. In the example of
Each conference location 110 may include an endpoint 112. Endpoint 112 may refer to any device that connects a conference location 110 to a conference. Endpoint 112 may be operable to capture audio and/or video from conference location 110 (e.g. using one or more microphones and/or cameras) and transmit the audio or video signal 160 to endpoints 112 at other conference locations 110 (e.g. through controller 120). Endpoint 112 may also be operable to play audio or video signals 162 received from controller 120. In some embodiments, endpoint 112 may include a speakerphone, conference phone, telephone, computer, workstation, Internet browser, electronic notebook, Personal Digital Assistant (PDA), cellular or mobile phone, pager, or any other suitable device (wireless, wireline, or otherwise), component, or element capable of receiving, processing, storing, and/or communicating information with other components of conferencing system 100. Endpoint 112 may also comprise any suitable user interface such as a display, microphone, speaker, keyboard, or any other appropriate terminal equipment usable by a user 116. Conferencing system 100 may comprise any suitable number and combination of endpoints 112.
In the example of
In certain embodiments, network 130 may refer to any interconnecting system capable of transmitting audio, video, signals, data, messages, or any combination of the preceding. Network 120 may include all or a portion of a public switched telephone network (PSTN), a public or private data network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a local, regional, or global communication or computer network such as the Internet, a wireline or wireless network, an enterprise intranet, or any other suitable communication link, including combinations thereof.
In some embodiments, controller 120 may refer to any suitable combination of hardware and/or software implemented in one or more modules to process data and provide the described functions and operations. In some embodiments, controller 120 and/or logic 152 may include a communication solution such as WebEx, available from Cisco Systems, Inc. In some embodiments, the functions and operations described herein may be performed by multiple controllers 120. In some embodiments, controller 120 may include, for example, a mainframe, server, host computer, workstation, web server, file server, a personal computer such as a laptop, or any other suitable device operable to process data. In some embodiments, controller 120 may execute any suitable operating system such as IBM's zSeries/Operating System (z/OS), MS-DOS, PC-DOS, MAC-OS, WINDOWS, UNIX, OpenVMS, or any other appropriate operating systems, including future operating systems. In some embodiments, controller 120 may be a web server running, for example, Microsoft's Internet Information Serverâ„¢.
In general, controller 120 communicates with endpoints 112 to facilitate a conference between users 116. In some embodiments, controller 120 may include a processor 140 and memory 150. Memory 150 may refer to any suitable device capable of storing and facilitating retrieval of data and/or instructions. Examples of memory 150 include computer memory (for example, Random Access Memory (RAM) or Read Only Memory (ROM)), mass storage media (for example, a hard disk), removable storage media (for example, a Compact Disk (CD) or a Digital Video Disk (DVD)), database and/or network storage (for example, a server), and/or any other volatile or non-volatile computer-readable memory devices that store one or more files, lists, tables, or other arrangements of information. Although
Memory 150 is generally operable to store logic 152 and enhanced audio signal 156. Logic 152 generally refers to logic, rules, algorithms, code, tables, and/or other suitable instructions for performing the described functions and operations. Enhanced audio signal 154 may represent the result of processing one or more audio signals 160 to improve the perceived sound quality of the audio signals.
Memory 150 is communicatively coupled to processor 140. Processor 140 is generally operable to execute logic 152 stored in memory 150 to facilitate a conference between users 116 according to the disclosure. Processor 140 may include one or more microprocessors, controllers, or any other suitable computing devices or resources. Processor 140 may work, either alone or with components of conferencing system 100, to provide a portion or all of the functionality of conferencing system 100 described herein. In some embodiments, processor 140 may include, for example, any type of central processing unit (CPU).
In operation, logic 152, when executed by processor 140, facilitates a conference between users 116. Logic 152 may receive audio and/or video signals 160 from endpoints 112. In the example of
Thus, audio signals 160a-b are relatively wideband signals as compared to audio signal 160c, a relatively narrowband signal. In particular, audio signals 160a-b contain spectral energy between 4 kHz and 8 kHz, while audio signal 160c does not. The lack of such high frequency content in audio signal 160c may be audibly detectible to a user 116 who listens to the audio signals 160a-c played back.
Logic 152 may be able to detect the bandwidth of an audio signal 160. If logic 152 detects that an audio signal 160 is a narrowband signal and/or lacks high frequency content, logic 152 may use the lower-frequency content (e.g. between 0 kHz and 4 kHz) in audio signal 160c to enhance audio signal 160c. For example, logic 152 may generate an enhancement signal to add to audio signal 160c based on the lower-frequency content already present in audio signal 160c. The enhancement signal may include high frequency content (e.g. between 4 kHz and 8 kHz). Logic 152 may then combine the audio signal 160 with the generated enhancement signal to produce an enhanced audio signal 156. The enhanced audio signal 156 may have spectral energy between 0 kHz and 8 kHz (i.e. a relatively wideband signal as compared to the source audio signal 160). Example methods for generating the enhanced audio signal 156 are described in more detail in connection with
Although the example of
Logic 152 may transmit audio and/or video signals 162 to endpoints 112. In the example of
If logic 152 determines that an audio signal 160 needs to be enhanced using audio bandwidth extension (e.g. audio signal 160 has a bandwidth below a particular threshold), logic 152 may use the enhanced audio signal 156 rather than the original audio signal 160 when producing audio signals 162. For example, logic 152 may determine that audio signal 160c is relatively narrowband and should be enhanced. Logic 152 may produce an enhanced audio signal 156 corresponding to audio signal 160c. Logic 152 may then produce audio signal 162a by combining audio signal 160b and enhanced audio signal 156. Logic 152 may transmit audio signal 162a to endpoint 112a. Logic 152 may also produce audio signal 162b by combining audio signal 160a and enhanced audio signal 156. Logic 152 may transmit audio signal 162b to endpoint 112a.
Thus, as a result of the audio enhancement performed by logic 152, a higher quality endpoint 112 may receive wideband audio signals 162 for each of the other endpoints 112, even if some of those endpoints are lower quality endpoints 112 that produce a more narrowband audio signal 160.
Although in the example of
In certain embodiments, the audio enhancement may be performed by endpoints 112, rather than logic 152 of controller 120. As one example, a higher quality endpoint 112a may receive audio signals 160b-c from endpoints 112b-c (either directly or via controller 120). Endpoint 112a may determine that audio signal 160c is relatively narrowband and should be enhanced. Endpoint 112a may produce an enhanced audio signal corresponding to audio signal 160c. Logic 152 may then produce audio signal 162a by combining audio signal 160b and the enhanced audio signal created using audio signal 160c. In creating audio signals 162, endpoints 112 may mix any suitable number and combination of audio signals 160 with any suitable number and combination of enhanced audio signals, according to particular needs.
At block 210, the system receives an input signal. For example, the input signal may be an audio signal 160 (e.g. from an endpoint 112) that controller 120 determines needs to be enhanced using audio bandwidth extension. In the example of
At block 220, the input signal is filtered using a band-pass filter to generate a band-limited audio signal. In the example of
At block 230, the band-limited audio signal is modulated by a carrier signal 235 at a first selected frequency to generate a first enhancement signal. In the example of
At block 240, the band-limited audio signal is modulated by a carrier signal 245 at a second selected frequency to generate a second enhancement signal. In the example of
At block 250, the first and second enhancement signals are summed to produce an enhanced audio signal. In certain embodiments, a weighted sum may be used. The weights to be used for each of the first and second enhancement signal may be determined based on the power in the input signal, the power in the band-limited audio signal, the power in the carrier signal 235, the power in the carrier signal 245, the power in the first enhancement signal, the power in the second enhancement signal, statistical analysis of reference speech signals, empirical analysis based on qualitative evaluations (e.g. of the naturalness of the bandwidth extension), and/or by any other suitable method. The weighting may be fixed, or they may be adaptively determined. For example, the weighting on the first enhancement signal may be selected based on the power in the input signal (either instantaneous power or an average power over some period of time). The weighting on the second enhancement signal may then be selected based on the weighting of the first enhancement signal. For instance, the weighting on the second enhancement signal may be selected to be half the weighting on the first enhancement signal.
Because the first enhancement signal has spectral energy between 4 kHz and 6 kHz and the second enhancement signal has spectral energy between 6 kHz and 8 kHz, the enhanced audio signal has spectral energy between 4 kHz and 8 kHz. Thus, the enhanced audio signal contains high frequency content (between 4 kHz and 8 kHz) generated based on the lower frequency content present in the input signal (between 0 kHz and 4 kHz). At block 260, the enhanced signal may be filtered using a high-pass filter with a cut-off frequency of approximately 4 kHz. This may minimize the presence of any lower frequency content in the enhanced audio signal. In certain embodiment, this filtering may be omitted. This disclosure contemplates selection of any suitable filter using any suitable parameters, according to particular needs.
At block 270, the original input signal is added to the enhanced audio signal. As a result, the enhanced audio signal will have spectral energy between 0 kHz and 8 kHz. The lower frequency components come from the input signal (0 kHz to 4 kHz), while the higher frequency components may be generated using the process just described (4 kHz to 8 kHz). In certain embodiments, a weighted sum may be used. The weights to be used for the input signal and the enhanced audio signal may be determined based on the power in the input signal, the power in the enhanced audio signal, statistical analysis of reference speech signals, empirical analysis based on qualitative evaluations (e.g. of the naturalness of the bandwidth extension), and/or by any other suitable method. The weighting may be fixed, or they may be adaptively determined. At block 280, the enhanced audio signal is output from the system.
Although the example of
At block 310, the system receives an input signal. For example, the input signal may be an audio signal 160 (e.g. from an endpoint 112) that controller 120 determines needs to be enhanced using audio bandwidth extension. In the example of
At block 315, the input signal is filtered using a band-pass filter to generate a first modulating signal. In the example of
At block 325, the first modulating signal is squared to produce a squared signal. This operation may introduce higher frequency content, such as in a frequency band around 4 kHz. This may be advantageous because, in certain embodiments, the input signal may not contain spectral energy around 4 kHz. As an example, some telephone systems may filter out most frequencies above 3.5 kHz.
At block 330, the squared signal is filtered using a band-pass filter to generate a second modulating signal. In the example of
At block 320, the input signal is filtered using a band-pass filter to generate a band-limited audio signal. In the example of
At block 335, the band-limited audio signal is modulated by the first modulating signal (generated by block 315) to generate a first enhancement signal. In the example of
At block 340, the band-limited audio signal is modulated by the second modulating signal (generated by block 330) to generate a second enhancement signal. In the example of
At block 345, the first and second enhancement signals are summed to produce an enhanced audio signal. In certain embodiments, a weighted sum may be used. The weights to be used for each of the first and second enhancement signal may be determined based on the power in the input signal, the power in the band-limited audio signal, the power in the first modulating signal, the power in the second modulating signal, the power in the first enhancement signal, the power in the second enhancement signal, statistical analysis of reference speech signals, empirical analysis based on qualitative evaluations (e.g. of the naturalness of the bandwidth extension), and/or by any other suitable method. The weightings may be fixed, or they may be adaptively determined. For example, the weighting on the first enhancement signal may be selected based on the power in the input signal (either instantaneous power or an average power over some period of time). The weighting on the second enhancement signal may then be selected based on the weighting of the first enhancement signal. For instance, the weighting on the second enhancement signal may be selected to be half the weighting on the first enhancement signal.
Because the first enhancement signal has spectral energy between 4 kHz and 6 kHz and the second enhancement signal has spectral energy between 6 kHz and 8 kHz, the enhanced audio signal has spectral energy between 4 kHz and 8 kHz. Thus, the enhanced audio signal contains high frequency content (between 4 kHz and 8 kHz) generated based on the lower frequency content present in the input signal (between 0 kHz and 4 kHz). At block 350, the enhanced signal may be filtered using a high-pass filter with a cut-off frequency of approximately 4 kHz. This may minimize the presence of any lower frequency content in the enhanced audio signal. In certain embodiment, this filtering may be omitted. This disclosure contemplates selection of any suitable filter using any suitable parameters, according to particular needs.
At block 355, the original input signal is added to the enhanced audio signal. As a result, the enhanced audio signal will have spectral energy between 0 kHz and 8 kHz. The lower frequency components come from the input signal (0 kHz to 4 kHz), while the higher frequency components may be generated using the process just described (4 kHz to 8 kHz). In certain embodiments, a weighted sum may be used. The relative weights to be used for the input signal and the enhanced audio signal may be determined based on the power in the input signal, the power in the enhanced audio signal, the power in the band-limited audio signal, the power in the modulating signal, the power in the enhancement signal, statistical analysis of reference speech signals, empirical analysis based on qualitative evaluations (e.g. of the naturalness of the bandwidth extension), and/or by any other suitable method. The weighting may be fixed, or they may be adaptively determined. At block 360, the enhanced audio signal is output from the system.
Although the example of
Although the present disclosure describes or illustrates particular operations as occurring in a particular order, the present disclosure contemplates any suitable operations occurring in any suitable order. Moreover, the present disclosure contemplates any suitable operations being repeated one or more times in any suitable order. Although the present disclosure describes or illustrates particular operations as occurring in sequence, the present disclosure contemplates any suitable operations occurring at substantially the same time, where appropriate. Any suitable operation or sequence of operations described or illustrated herein may be interrupted, suspended, or otherwise controlled by another process, such as an operating system or kernel, where appropriate. The acts can operate in an operating system environment or as stand-alone routines occupying all or a substantial part of the system processing.
Although the present disclosure has been described in several embodiments, a myriad of changes, variations, alterations, transformations, and modifications may be suggested to one skilled in the art, and it is intended that the present disclosure encompass such changes, variations, alterations, transformations, and modifications as fall within the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5455888 | Iyengar et al. | Oct 1995 | A |
6889182 | Gustafsson | May 2005 | B2 |
6895375 | Malah et al. | May 2005 | B2 |
6988066 | Malah | Jan 2006 | B2 |
7216074 | Malah et al. | May 2007 | B2 |
7359854 | Nilsson et al. | Apr 2008 | B2 |
7546237 | Nongpiur et al. | Jun 2009 | B2 |
7613604 | Malah et al. | Nov 2009 | B1 |
7630881 | Iser et al. | Dec 2009 | B2 |
7912729 | Nongpiur et al. | Mar 2011 | B2 |
7916876 | Helsloot et al. | Mar 2011 | B1 |
8069038 | Malah et al. | Nov 2011 | B2 |
20020128839 | Lindgren et al. | Sep 2002 | A1 |
20020138268 | Gustafsson | Sep 2002 | A1 |
20030009327 | Nilsson et al. | Jan 2003 | A1 |
20030093278 | Malah | May 2003 | A1 |
20030093279 | Malah et al. | May 2003 | A1 |
20040243402 | Ozawa | Dec 2004 | A1 |
20050004803 | Smeets et al. | Jan 2005 | A1 |
20050187759 | Malah et al. | Aug 2005 | A1 |
20060106619 | Iser et al. | May 2006 | A1 |
20070150269 | Nongpiur et al. | Jun 2007 | A1 |
20080126081 | Geiser et al. | May 2008 | A1 |
20080208572 | Nongpiur et al. | Aug 2008 | A1 |
20080300866 | Mukhtar et al. | Dec 2008 | A1 |
20090030699 | Iser et al. | Jan 2009 | A1 |
20100042408 | Malah et al. | Feb 2010 | A1 |
20100057476 | Sudo et al. | Mar 2010 | A1 |
20100063827 | Gao | Mar 2010 | A1 |
20100228543 | Kabal et al. | Sep 2010 | A1 |
20100246803 | Tashiro et al. | Sep 2010 | A1 |
20110054885 | Nagel et al. | Mar 2011 | A1 |
20110153318 | Rossello et al. | Jun 2011 | A1 |
20110231195 | Nongpiur et al. | Sep 2011 | A1 |
20110257980 | Gao | Oct 2011 | A1 |
20110288873 | Nagel et al. | Nov 2011 | A1 |
20120010880 | Nagel et al. | Jan 2012 | A1 |
20120070007 | Kim et al. | Mar 2012 | A1 |
20120095757 | Gibbs et al. | Apr 2012 | A1 |
20120095758 | Gibbs et al. | Apr 2012 | A1 |
20120106742 | Bharitkar et al. | May 2012 | A1 |
20120116769 | Malah et al. | May 2012 | A1 |
Entry |
---|
Larsen, et al.; John Wiley & Sons, Ltd.; Audio Bandwidth Extension; Application of Psychoacoustics, Signal Processing and Loudspeaker Design; 301 pages, 2004. |
Arttu Laaksonen; Helsinki University of Technology; Bandwidth Extension in High-Quality Audio Coding; 69 pages, May 30, 2005. |
Number | Date | Country | |
---|---|---|---|
20140169542 A1 | Jun 2014 | US |