1. Field of the Invention
This invention relates generally to microphone audio signal processing, particularly related to multiplexed microphone signals with multiple signal processing paths.
2. Description of the Related Art
A microphone is a basic and essential element in an audio system. There are many different applications to a variety of audio systems. The most common audio systems include, at least, the following types: a teleconference system, a public addressing (PA) system, a recording studio, or some combination of the above three.
A simplest teleconference system is a telephone. Two people at two physically separate locations may talk to each other through a telephone network and two telephone sets.
In a more advanced telephone, the processor module 106 may have more circuitry or more processing power to perform many functions. One state of the art telephone is a Polycom SoundStation® VTX-1000 speakerphone, available from the assignee of the current invention. The VTX-1000 has many more features and functions. For example, it is a speakerphone that allows full-duplex mode of operation. In full-duplex mode, talkers at both sites of the conference call can speak at the same time. To allow full-duplex mode of operation, the VTX-1000 has an advanced acoustic echo canceller (AEC). Without an AEC, annoying echo-like sounds will circulate between the two sites. If AEC is not implemented, then the speech signal 172 from a talker at the far site is transmitted through the network 130 to the near site telephone 110 as signal 134. The speech signal 134 is reproduced by the loudspeaker 104. Since the telephone is operating in full-duplex mode, the microphone 102 is active when loudspeaker 104 is working. The microphone 102 generates a signal 132, which contains contributions due to the far end speech signal 172 from the loudspeaker 104. This far end signal embedded in signal 132 is transmitted back to the far end together with the near site speech signal also in signal 132. The entire signal 132 becomes a loudspeaker signal 174 at the far end and reproduced by loudspeaker 154. This way, the far end talker will hear his voice back from the loudspeaker 154, like an echo. This echo speech signal produced by the loudspeaker 154 can again be picked up by microphone 152, transmitted through network 130, reproduced by loudspeaker 104, picked up by microphone 102 and transmitted back to loudspeaker 154. If nothing is done to it, the echo signal can circulate between the two sites for a long time until dissipated into background noise, which is increased due to such echoes. Without AEC, full-duplex mode operation in a speakerphone is not practical due to the echoes and the noise.
When a process module 106 performs echo cancellation, it estimates the contribution of echo in the microphone signal 132 and subtracts that portion from the microphone signal 132. This way, signal 132 only contains signals due to the speech of near site talkers. Therefore, what a far end talker can hear is the speech of near site talkers alone, without echo of his own voice. At the far end, another process module 156 may perform the similar acoustic echo cancellation. To achieve optimal goal of solving the echo problem, besides acoustic echo cancellation, echo suppression and noise fill may also be used. That is to minimize the residual echo heard by participants at the far site.
The process modules 106 and 156 may also perform other audio signal processing. For example, such processing may include parametric equalization. A particular microphone element may not respond to sound with uniform gain for all frequencies. To compensate for this non-uniformity, the process module may apply different filters on different frequencies to enhance or attenuate the frequency to achieve the uniform gain across the spectrum. The process module may also adjust the gain to change the characteristic of the speech or to achieve other acoustic objectives.
The process modules may also include automatic gain control (AGC) to accommodate the different loudness of speech from different talkers. There are various factors that may affect the gain of a microphone to speech, such as the loudness of the talker, the distance between the talker and the microphone or the orientation of the microphone and the talker. The use of AGC can avoid the wide fluctuation of the speech reproduced by a loudspeaker.
Another application of microphone signals is a public addressing system or a sound reinforcement system, as illustrated in
As illustrated in
As discussed above, different applications of microphone signals may require different processes. Some of the processes are similar, for example, most of the systems use AGC and PEQ. Some processes are different, for example AEC, FBE etc. Some processes necessary for one application may be in conflict with the purpose of another application. For example, feedback elimination is necessary for sound reinforcement application, but can degrade the acoustic quality. Feedback elimination should not be used in a sound recording application.
For clarity, systems 100, 200 and 300 are described separately and apply to different applications. But in actual applications, these systems may be used together in a single setting. For example, in a distance learning application as illustrated in
Currently, even if a microphone system or audio system is installed for one particular application, the system still has to be modified or adjusted extensively for that particular application. It is time consuming, costly and confusing. To custom-manufacture or configure a microphone system or audio system useful for only one particular application is possible, but it increases the cost and is not desirable.
It is more to desirable have a system or method that can adapt to a particular application easily. It is very desirable to have a system that can accommodate all application goals at the same time and avoid the apparent conflicts between them.
The current invention uses a process module that can route a microphone signal to different processing paths. Each path is customized to achieve the goal for a particular application. The identical processes within different paths may be performed by the same process module to avoid duplication and save processing power. When installing the system, a process path is selected for a particular application. No complicated configuration is required. All potentially conflicting processes are accommodated within the same processor.
A better understanding of the invention can be obtained when the following detailed description of the preferred embodiment is considered in conjunction with the following drawings, in which:
The current invention includes devices and methods to multiplex microphone signals, where each signal is used for a particular application. Each signal path is independent from another signal, so conflicting signal processes may be applied for the different signals. Some processes are used in several signal paths, then such processes may be shared among the signal paths.
The ungated signal 433 is configured to be used in an open-loop system, such as a sound recording system. The signal 433 is processed to achieve the highest quality and reliability. Any sound picked up by the microphone 402 is presented at signal 433 with high fidelity. Typically, only one or a few microphone signals are mixed for each output 433. Signal 433 may be recorded by a high quality sound recorder or broadcasted to others.
A second path generates a gated signal 453. The gated signal 453 is configured to be used in a closed-loop system, more particularly, a conferencing system. The echo suppression and noise fill process (SNF) 442 complements an AEC 414 to reduce echo heard by people at a far site. A noise fill is typically necessary to avoid dead silence at the far site, when people at the near site are not talking. Because of the echo suppression and noise fill process, the gain of the local microphone can vary dynamically depending on whether there are any people talking. In a conference setting, local speech is not reproduced in local loudspeaker, so it does not matter whether the gain varies. If a gated signal 453 is reproduced in a local loudspeaker, such as in a local sound reinforcement system, then the SNF 442-caused variation can be noticeable and sometimes annoying.
A third signal path generates a sound reinforcement signal 473. The sound reinforcement signal 473 is configured for use in a sound reinforcement system. SNF 442 is not used. The main reason for this is the doubletalk problem. In an audio conference, there are times when only people at one conference site are talking, i.e., single-talk, and there are times when people at more than one site are talking, i.e., doubletalk. SNF 442 works differently depending on whether there is single-talk or doubletalk in the conference. It is not a problem in a conference application, as discussed above related to the second signal path. But when the amplitude of local speech is reproduced by local loudspeakers, the fluctuation in the gain of the local speech can be noticeable and problematic. It is as if someone is mischievously turning the amplifier volume dial down or up as soon as you start speaking or stop speaking. By removing SNF 442, the associated doubletalk problem is eliminated. The gain of the speech remains stable. Instead, FBE 462 is used. FBE reduces the feedback problem by attenuating a frequency that the FBE predicts to be likely to cause howling. Because of this attenuation, the sound spectrum is artificially altered. The resulting sound quality is lower. The particular frequency which is attenuated may vary with time, so the overall degradation of the sound quality may be minor. Even so, at any particular time and at a particular frequency, the distortion can be substantial. If that particular frequency at that time is significant for some reason, then the signal 473 could be unacceptable. That is why signal 473 is not suitable for use in a court reporting application, where reliability is paramount.
In both the gated and sound reinforcement paths, automatic microphone mixing (AM) 448 and 468 are used. In a case of multiple microphones generating a single signal, an AM shuts off the microphone where no speech is detected and only opens the microphone where speech is detected. This way, noise signals from microphones that do not have speech signals are not mixed into the final speech signal. The SNR of the resulting mixed speech signal is improved. In a single signal processing situation, AM is essentially an on/off switch. When there is no speech signal detected at the microphone, the AM turns the signal off, such that the noise from this microphone is not supplied to downstream signal processing. When there is speech signal, then the signal is turned on and supplied to downstream processes. This improves signal quality for both versions. It improves gain before feedback in the sound reinforcement version. AM is not used in the ungated version to avoid possible attenuation of the local speech. And by definition, the ungated version is typically used for an application where there is minimum background noise (i.e. recording studio) or where all “noises” are, “signals” (i.e. court reporting).
Referring back to the setting illustrated in
The gated signal 536 is the output signal from the gated path. It is transmitted through a network 530 to the far site. This signal is substantially echo free.
The local sound reinforcement signal 534 is the output signal from the sound reinforcement path. It is combined with the loudspeaker signal 537 from the far site at a mixer 541 to form a local loudspeaker signal 539. Local loudspeaker signal 539 is reproduced by loudspeaker 504. So at the near site, both the local speech 532 and the far site speech 537 are amplified and can be heard by people at the near site of the conference.
The audio system 550 at the far site can be similar to the audio system 510 at the near site as discussed above, but it is not necessary. For example as shown in
Most of the data processes can be implemented in a single data processor, such as a DSP.
In prior art systems that include an adequate DSP, the current invention can be practiced by changing the process module in an existing audio system or reprogramming the processor in such a system. Such an upgrade can expand the capabilities of audio systems at very small incremental cost.
The current invention may also be practiced using a prior art system with limited capabilities, such as a Peavey Media Matrix and a Polycom Vortex conference unit. One such application is shown in
According to the embodiments of the current invention, a microphone signal can go through several different processing paths. Each path is configured for a particular application. Different paths share the common processes to reduce computation loads. The individual processes may also be combined differently by a user to make a customized signal processing for a highly specialized application. The above discussion has focused on three common audio system applications that are distinct. Sometimes they have conflicting objectives or priorities. There are many other applications and processes not mentioned here. The current invention, where a signal can go through different processing paths and sharing common processes, is still applicable to them.
While illustrative embodiments of the invention have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.