Aspects of the present disclosure provide for a system and a method for correcting for distortion, e.g., non-linear distortion, from an audio signal transducer in a linear echo cancellation system.
Acoustic devices are used to project sound and send audio signals to remote devices to allow people to communicate with each other. Echoes and other unwanted signals can interfere with the quality of the acoustic signals being exchanged.
The sound from a loudspeaker can be reflected or coupled back to a microphone after some finite delay, producing an echo. In an ideal situation, the production of the echo (sound) which corresponds to the electrical signal in the apparatus is a linear process. The echo cancellation systems are considered linear systems and can remove distortion that is produced by linear processes. However, transducers, such as loudspeakers, may also create non-linear distortion. Linear echo cancellation systems have historically struggled with the problem of non-linear distortion and are unable to directly remove this distortion from the echo.
An overdriven amplifier causes nonlinear distortion by creating harmonics and inter-modulation distortion from the clipping of large amplitude signals; see U.S. Pat. No. 4,809,336 (Pritchard), incorporated herein by reference. Enclosure vibration due to mechanical coupling between a loudspeaker and an enclosure, especially at lower voice frequencies, also causes significant nonlinear distortion that is picked up by the microphone. The loudspeaker itself is a major source of nonlinear distortion. The nonlinearities can be acoustic, electromagnetic, or mechanical, such as distortion of the cone or diaphragm or the voice coil traveling in non-uniform magnetic fields in the pole gaps or even hitting an end of travel mechanical constraint.
An audio device is described that can reduce the effects of nonlinear distortion and/or echo. The audio device includes a first microphone configured to produce a first signal and a loudspeaker assembly having a loudspeaker enclosure, a loudspeaker associated with the loudspeaker enclosure and a second microphone associated with the loudspeaker. The second microphone is configured to produce a second signal based on output from the loudspeaker. A canceller, e.g., circuitry, is configured to receive the first signal and the second signal and can use the second signal as a reference signal canceller signal to reduce the non-linear loudspeaker distortion as part of the first signal to produce an output signal.
In an example, the second microphone is a high pressure microphone positioned with the interior of the loudspeaker enclosure.
In an example, the first microphone is configured to sense an acoustic signal outside the device.
In an example, the first microphone is a high signal-to-noise microphone and wherein the second microphone is a high pressure microphone.
In an example, the canceller is configured to cancel an echo signal produced by the loudspeaker emitting an acoustic signal that is at least partially sensed by the first microphone.
In an example, the canceller includes an output to send the output signal outside the device to a communication network, another communication device, or both.
In an example, the canceller includes a first state with no signal being output from the loudspeaker and no talk signal being sensed by the first microphone, a second state with no signal being output from the loudspeaker and a talk signal is sensed by the first microphone, a third state with a signal being output from the loudspeaker and a talk signal being sensed by the first microphone, and a fourth state with a signal being output from the loudspeaker and no talk signal being sensed by the first microphone.
In an example, the canceller is trained in the fourth state to linearly predict the echo including the nonlinear distortion produced by the loudspeaker.
In an example, the canceller includes a blocking matrix and a filter bank, both of which are trained, at least in part, using the second signal.
In an example, the canceller includes a summing circuit to subtract the predicted echo including nonlinear distortion, which is derived from second signal, from the first signal.
In an example, the second signal is filtered by an adaptive filter to produce an echo estimate. The canceller includes a summing circuit to subtract the echo estimate from the first signal.
In an example, the loudspeaker enclosure includes a back cavity. The second microphone is positioned in the back cavity.
In an example, the canceller outputs a signal, which has the echo and the non-linear distortion removed, to a voice recognition circuit that produces a voice recognized signal that can provide information or control another device or control the present device.
In an example, the first microphone configured to sense a near talker to produce the first signal.
In an example, the loudspeaker outputs an acoustic signal from a far talker received over a communication network.
The audio device as described herein may be a personal data assistant, a mobile phone, a music player, a digital assistant speaker,
Any of the above examples can be combined together in any combination.
Various methods are described to remove or reduce non-linear distortion. A non-linear distortion removal method may include sensing a first acoustic signal at a microphone remote from a loudspeaker, sensing a second acoustic signal at the loudspeaker that contains loudspeaker distortion, and removing the second acoustic signal from the first acoustic signal to remove non-linear distortion produced by the loudspeaker.
In an example, sensing the second acoustic signal at the loudspeaker includes sensing the second acoustic signal in the loudspeaker enclosure or in the loudspeaker back cavity.
In an example, sensing the second acoustic signal includes sensing using a high pressure microphone.
In an example, subtracting removes any echo sensed by the microphone remote from the loudspeaker.
A non-linear distortion removal method includes sensing a first acoustic signal at a microphone remote from a loudspeaker, sensing a second acoustic signal at the loudspeaker, training an echo filter and a blocking matrix using the sensed second acoustic signal from inside a loudspeaker enclosure, and enhancing an output signal using the echo filter as well as the blocking matrix to remove echo including non-linear distortion from the sensed first acoustic signal.
In an example, the method further trains an echo prediction filter using the sensed second acoustic signal from inside a loudspeaker enclosure as a reference signal.
In an example, the method further includes filtering a loudspeaker signal using the echo filter to produce a filtered signal,
In an example, the method further includes summing the filtered signal with the sensed first signal to produce a difference signal with the echo including non-linear distortion removed.
In an example, the method further includes applying analysis filter banks to produce a time-frequency transformation representation signal of the first and second signals.
In an example, the method further includes applying a blocking matrix on the time-frequency representation signal to produce a blocking matrix output.
In an example, the method further includes applying a beam former to the time-frequency representation signals and the blocking matrix output to produce a beam former output.
In an example, the method further includes estimating the noise power using the time-frequency representation signals, the blocking matrix output, and the beam former output.
In an example, the method further includes post filtering the beam former output using the estimated noise power to produce a post filter signal.
In an example, the method further includes applying a synthesis filter to the post filter signal to produce an enhanced time domain output signal.
In any of the above examples, there may be a plurality of loudspeakers and corresponding plurality of microphones associated with the plurality of loudspeakers. An echo canceller may receive signals based on signals from the plurality of microphones and be configured to reduce or remove the echo including the non-linear distortions in the signal input into the system. In an example, one echo/distortion canceller receives a signal from one of the plurality of microphones. In an example, loudspeakers in mobile devices, e.g., phones, headphones, digital music players and the like, may have problems with non-linearities.
The embodiments of the present disclosure are pointed out with particularity in the appended claims. However, other features of the various embodiments will become more apparent and will be best understood by referring to the following detailed description in conjunction with the accompany drawings in which:
The present disclosure is provided in the context of the acoustic echo in loudspeaker-microphone systems which also implement echo cancellers.
As indicated, echo cancelling systems are generally not well suited to remove nonlinear distortion caused by a loudspeaker transducer particularly in compact, hands-free kits for cellphones and other mobile devices. Many of the problems associated with hands-free kits have been attributed to inexpensive, smaller loudspeakers. When such a loudspeaker is overdriven, saturation effects associated with the loudspeaker and its amplifier distort sound in a nonlinear manner. An acoustic echo of such sound contains a mixture of linear signal and nonlinear harmonic and intermodulation components. A typical acoustic echo canceller estimates only the linear acoustic impulse response of the loudspeaker-enclosure-room environment and microphone system. The remaining nonlinear components in the system can be large and audible when compared in level to the near end talker that is not as close to the microphone, particularly at high volume.
Detailed embodiments are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present disclosure.
The embodiments of the present disclosure generally provide for a plurality of circuits or other electrical devices. All references to the circuits and other electrical devices and the functionality provided by each, are not intended to be limited to encompassing only what is illustrated and described herein. While particular labels may be assigned to the various circuits or other electrical devices disclosed, such labels are not intended to limit the scope of operation for the circuits and the other electrical devices. Such circuits and other electrical devices may be combined with each other and/or separated in any manner based on the particular type of electrical/operational implementation that is desired. It is recognized that any circuit or other electrical device disclosed herein may include any number of microprocessors, integrated circuits, memory devices (e.g., FLASH, random access memory (RAM), read only memory (ROM), electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), or other suitable variants thereof) and instructions (e.g., software) which co-act with one another to perform operation(s) disclosed herein. In addition, any one or more of the electric devices may be configured to execute a computer-program that is embodied in a computer readable medium that is programmed to perform any number of the functions and features as disclosed. The computer readable medium may be non-transitory or in any form readable by a machine or electrical component. For ease of description the various circuit elements may not be described in detail but are part of the structural elements described. Examples of structural elements that include circuitry include the echo canceller, microphones, filters, amplifiers and communication connection devices.
Aspects disclosed herein may decrease the effect of the distortions in the acoustic signal produced by a loudspeaker. Echo cancellers may operate to reduce the effect of the echo that occurs in the physical space of the loudspeaker. Echo cancellers work to learn the room acoustics system impulse response and remove predictable echoes, e.g., linear echoes, to improve the signal sent to a remote listener. However, loudspeakers may have non-linear distortions and echo cancellers cannot remove non-linear distortions using a linear system. Such non-linear distortions may further interfere with the training of the noise canceller or the echo canceller, causing its room impulse response estimation to diverge away from a quality solution if the echo canceller trains using the residual error signal that contains non-linear distortion.
A digital representation of the signal from microphone 101 is coupled to the echo canceller 105.
The echo canceller 105 operates on both original far end sound and near end sound, which can include an echo. The echo canceller can now also reduce echo including non-linear distortion caused by the loudspeaker. Echo canceller can subtract the estimated echo derived from signal 112 from the near end signal 113. The echo component of near end signal 113 now only has echo that is linearly derivable from reference signal 112, in addition to the local original sound. Original sound can include, for example, near-end speech and background noise. “Near-end” refers to one end of a two channel communication link between two parties to a telephone call. “Far end” refers to conditions on the telephone lines, including “line out” and “line in,” and signals from the telephone of the other party.
An example of an echo canceller system 105 is described in US Patent Publication No. 2014/0056435, which is hereby incorporated by reference, and can be used with the presently described microphone associated with the loudspeaker.
An echo canceller can have a plurality of states of operation. There may be four states: Idle (neither side is talking), Transmit (a user who is at the speakerphone or audio system 100 is talking), Receive (the person at the far end of the conversation is talking, e.g., a person at device 3001, see
The microphone 124 is in the cavity with the loudspeaker 122. The microphone 124 is in the back cavity of the loudspeaker housing, e.g., adjacent the coil driving the loudspeaker cone. Preferably the microphone is mounted in the inside wall of the loudspeaker housing. The microphone 124 can be a high acoustic overload point microphone as it is adjacent the loudspeaker 122 and in the back cavity or loudspeaker enclosure. The microphone 124 must be able to operate in a high decibel environment in the loudspeaker back cavity or enclosure, where acoustic pressure is high. The microphone 124 is not sensitive to the environmental acoustics or the area, e.g., a room, as the sound power in the loudspeaker cavity is significantly greater than the sound power in the environment outside the loudspeaker cavity. The mass of the loudspeaker cone also provides some additional isolation between the outside and the inside of the loudspeaker enclosure or back cavity. The sound level in the loudspeaker cavity can be 160 dB SPL or more. The sound level in the loudspeaker cavity will be greater than the sound level from the loudspeaker in the room or the external environment.
The signal from the microphone 124 is sent to a signal processor 140, which can include an analog to digital converter and filters. The signal from the signal processor 140 can be fed to the echo canceller 105. Signal processor 140 can further amplify the signal. In an example, the signal processor of the canceller 105 can include a frequency or time domain adaptive filter, e.g., a finite impulse response (FIR) filter.
The signal form the microphone 124 now includes any non-linearities generated by loudspeaker 122 or any amplification of the signal to the loudspeaker by the signal processor 111.
Echo canceller 105 can include processing circuitry and can estimate the linear response of loudspeaker-enclosure-microphone assembly 120. Echo canceller 105 may model the linear acoustic impulse response because the signal from the microphone 124 is the already nonlinearly distorted signal. In a conventional acoustic echo canceller, an adaptive filter can only model the linear response of the system and, typically, does not model the nonlinear responses.
The loudspeaker 122 can produce non-linear distortions in the acoustic signal being generated from the signal input into the loudspeaker 122. The loudspeaker 122 can be an electroacoustic transducer and operates by converting an electrical audio signal into a corresponding sound from the loudspeaker. An alternating current electrical audio signal is applied through the voice coil; a coil of wire is suspended in a circular gap between the poles of a permanent magnet. The coil is forced to move rapidly back and forth due to Faraday's law of induction, which causes a diaphragm (e.g., a loudspeaker cone) attached to the coil to move back and forth thereby pushing on the air to create sound waves. Non-linear distortions can result from the magnetic field not being uniform in the gap. The more the coil moves out of the gap, the greater the change in the magnetic field, thus there are greater non-linearities when the coil moves to a greater extent. The non-linear distortions can be harmonic and intermodulation distortions. These non-linearities can be a function of the type of sound (speech, music and the like) being played and at what volume the sound is being played. These distortion components are very difficult to predict and are eliminated usually by using echo suppression, where the signal below a certain level is just significantly reduced with additional loss, or even zeroed out completely. Unfortunately, this can often distort near end talker signal as well.
While shown in
The signal energy levels of the receive signals, and the audio (external microphone) signal after the echo canceller has removed the predicted echo are compared, and a decision is made on which is the appropriate state the system should be in. This residual signal when in the receive state is also used to train the echo canceller, changing its filter coefficients to produce a better echo prediction, thus lowering the echo heard by the far end user.
A blocking matrix B(l,k) 203 of dimensions M rows by N columns, where l≤N<M is applied by the operation Z(l,k)=BH(l,k)Y(l,k). The blocking matrix is designed to attenuate the target signal, while at the same time having a full rank, i.e. the N columns are linearly independent. The blocking matrix may in an embodiment be predetermined. In a further embodiment the blocking matrix can be adaptive, in order to track a target that changes position. An embodiment may use Eq. 2 of US Patent Publication No. 2014/0056435 for calculating a blocking matrix. A beam former 204 processes the M communication signals to obtain an enhanced beam formed signal by means of a set of beam former weights w(l,k) so that Yw(l,k)=wH(l,k)Y(l,k). The beam former may in some embodiments have predetermined weights. In other embodiments the beam former may be adaptive. A common method is a Generalized Side lobe Canceller (GSC) structure where the blocking matrix signal Z(l,k) is adaptively filtered with coefficients q(l,k) and subtracted from a predetermined reference beam former w0(k), to minimize the beam former output, e.g. w(l,k)=w0(k)−B(l,k)q(l,k). The noise power estimator 205 provides an estimate {circumflex over (ϕ)}VV(l,k) of the power of the noise component of the enhanced beam formed signal. The noise power estimate is used by the post filter 206 to yield a time-frequency dependent gain g(l,k) which is applied to the enhanced beam formed signal. The gain may be derived by means of a gain function, e.g. as function of the estimated signal-to-noise-ratio (SNR) value ξ(l,k), as g(l,k)=G(ξ(l,k)), which in some embodiments can be a bounded Wiener filter to reduce audible artifacts. In some embodiments, other functions may contribute to or process the gain value, such as equalization, dynamic compression, feedback control, or a volume control. In an embodiment, the gain function is a bounded spectral subtraction rule. The estimated SNR value may in a further embodiment be derived from a decision-directed approach.
The post filter 206 outputs a time-frequency weighted signal X(l,k)=Yw(l,k)g(l,k) to a synthesis filter bank 207 which produces an enhanced time domain signal where the target signal is preserved and noise signals are attenuated. The synthesis filter bank 207 may apply an overlap-sum scheme so that an enhanced output signal 208 is output. The enhanced signal 208 may in some embodiments be used for transmission to the remote part or remote device. In other embodiments, an automated speech recognition system or a voice control system may receive the signal for processing.
The microphone 124 may be used to control the training of the acoustic echo prediction filter bank 210 or the blocking matrix 203. For example, the signal from the microphone 124 can move the filter bank or blocking matrix to and from a training mode. Still further the signal from the microphone 124 can be used to capture both linear and non-linear components from the distortion of the loudspeaker output before the echo (or non-linearities is cancelled. The microphone 124 is adjacent the loudspeaker 209, e.g., in the same enclosure or in the back cavity adjacent the loudspeaker driver. The signal sensed by the microphone 124, as well as signals Y, determines when the system 200 is in a mode where the system 200 can be trained, e.g., update the blocking matrix 203 or the echo prediction filter 210.
While shown in
An outside microphone 404 picks up a voice signal with loudspeaker echoes, which it inputs into a summing circuit 406. The summing circuit 406 removes the linearly predicted echo from the voice signal from the outside microphone 404 and outputs the voice output signal 407. The output from the summing circuit may be used to control the echo canceller 403.
In an example operation of system 500, the far end talker 503 will say something. That utterance will be transferred, through the system (microphone 502, device 501, communication link 505A and circuitry of device 510), to the electrical signal driving the near end loudspeaker 520. The circuitry, e.g., amplifier 511, in the device 510 will provide linear signal to drive the near end loudspeaker 520. The near end loudspeaker 520 recreates that sound from the far end talker 503 and plays it out for the near end talker 531 to hear. The near end talker 531 will respond and this utterance will be picked up by the near end microphone 528 in front of the near end talker 531. The device 510 processes the signal and sends, through the communication link 505B, to the loudspeaker 540 at the far end loudspeaker 503. Unfortunately, the output from the loudspeaker 520 at the near end will also be picked up by the near end microphone 528 and would be sent to the far end talker 503 but for the echo canceller 525 and processing circuitry in the device 510. Absent this processing, the far end talker 503 will not only hear the near end talker 531 but to also hear his own voice, which has been delayed by the inherent nature of the system 500. This makes effective communication nearly impossible.
The canceller 625 receives the signal from the microphone 124 and subtracts the signal from the loudspeaker, including the non-linear signal components from the signal from the microphone 602. The conditioned signal from the canceller 625 to the voice recognition circuit 640.
The device 601 also includes a voice recognition circuit 640 that receive the echo and non-linear distortion cancelled signal from canceller 625 that includes a signal from the microphone that is conditioned by the signal microphone 124. Thus, the signal at the voice recognition circuit 640 is a purer signal, e.g., reduced non-linear echo distortion and reduced echo. This will allow the voice recognition circuit 640 to operate better to recognize the actual spoken voice.
The device 601 can also include an input/output device 650, e.g., an antenna, hard wire, to allow the device 601 to communicate to another device connected to device 601 through the I/O device 650. The I/O device 650 can be connected to the cloud, e.g., a computer network. The voice recognized signal can be processed or stored in the cloud, e.g., a remote computer or memory. The voice recognized signal can be processed at a remote location, e.g., the SIRI service from Apple Corp. of Cupertino, Calif. or Cortana from Microsoft Corp. of Redmond, Wash. Such a voice recognized signal can be used to change operational modes of an audio device, control the music (change volume, change song/track, fast forward, rewind, and the like), request information, request directions for navigation, place telephone calls, send electronic messages and the like.
In an example scenario using system 600, the device 601 can be playing voice or music from the device loudspeaker 620. The user 603 will attempt to talk to the device 601 though microphone 602 in order to access some information or direct the device 601 to move to another mode or operation. Unlike the operation of the
The noise canceller 625 may rely on the loudspeaker drive signal 621 being subtracted out and may use a model of the non-linearities as well to suppress the non-linearities. However, this example runs into the same issues as in the
In an example, a loudspeaker with a microphone within the cavity of loudspeaker can be claimed for use with an echo canceller with the sensed signal from the loudspeaker microphone being used as the echo cancellor reference to move the non-linear distortion producing elements, to be placed before the echo canceller reference signal is obtained that is used to remove the echo from the audio signal, rather than after it.
The present disclosure describes the microphone being in a cavity in which a loudspeaker is mounted to emit sound waves from loudspeaker. The loudspeaker can be a sound transducer mounted in a housing, e.g., a mobile phone case, a box, a case and the like. The housing can form a substantially sealed air space back cavity acoustically coupled to the sound transducer. The back cavity can be defined by the loudspeaker cone and also contain the loudspeaker driver. The back cavity can be sealed, without ports. The back cavity may also include at least one port through the housing to the exterior of the housing, or possibly a passive radiator diaphragm.
The audio devices 100, 200 or 400 can also be used to allow automated human-to-machine voice command and control. The audio devices 100, 200, 300 or 400 can also play music. For example, music being played by the device 100, 200 or 400 may interfere with voice command and control. In human to human communications, audio from the far end talker may echo back from the loudspeaker of a device back into the microphone of the same device and go back to the far end talker with some delay, interfering with the far end talker's ability to communicate.
The audio devices 100, 200 or 400 can be used in a conference phone or loudspeaker phone, as well as rooms that have both loudspeakers and microphones, or other aidio systems. The devices can be a telephone that includes a microphone and loudspeaker in a sculptured case. The internal microphone is placed in the back cavity of the loudspeaker. The present description can be used with a hands-free kit for providing audio coupling to a cellphone or other mobile device such as tablets, netbooks, and portable computers. The audio systems 100, 200, 400 and 600 can also be used in vehicles.
The present inventors have discovered that prior echo cancelation systems can not accurately account for non-linear distortions, e.g., distortion in the loudspeaker. In some uses, distortion from the loudspeaker can actually be louder than the near end user's voice, e.g., a voice command, for use by a vehicle or other electronic system, which in turn creates problems in capturing the voice acoustic signal (e.g., a command) given that the microphones also captures the distortion from the loudspeaker. The distortion can thereby interfere with processing the user's voice acoustic signal. An example of the present disclosure includes a microphone, e.g., a high pressure microphone, in the back of a loudspeaker cavity to sense the distorted signal produced by the loudspeaker. That is, a microphone monitors the loudspeaker. The sensed signal plus any distortion can then be used in processing (e.g., circuitry, including processors and memory) to remove the loudspeaker output and its distortion. In an example, the signal from the microphone in the back cavity of the loudspeaker is fed into the adaptive filter. The received signal from a microphone inside the loudspeaker cavity, in conjunction with the output of the echo canceller's summer, can be used be used to decide what state the echo canceller is in and the original receive signal will no longer be fed into the adaptive filter.
The presently described systems and methods can also be used to allow automated human-to-machine voice command and control with improved echo cancellation. For example, music being played by the device may interfere with voice command and control. In human-to-human communications, audio from the far end talker may echo back from the loudspeaker of a device back into the microphone of the same device and go back to the far end talker with some delay, interfering with the far end talker's ability to communicate. The present disclosure improves the operation of both human-to-human communication and human-to-machine communication.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
This application claims benefit of U.S. Provisional Ser. No. 61/162,210, filed May 15, 2015, the disclosure of which is hereby incorporated in its entirety by reference herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2016/032318 | 5/13/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62162210 | May 2015 | US |