An embodiment of the invention relates to personal listening audio devices such as earphones and telephone handsets, and in particular the use of acoustic noise cancellation or active noise control (ANC) to improve the user's listening experience by attenuating external or ambient background noise. Other embodiments are also described.
It is often desirable to use personal listening devices when listening to music and other audio material, or when participating in a telephone call, in order to not disturb others that are nearby. When a compact profile is desired, users often elect to use in-ear earphones or headphones, sometimes referred to as earbuds. To provide a form of passive barrier against ambient noise, earphones are often designed to form some level of acoustic seal with the ear of the wearer. In the case of earbuds, silicone or foam tips of different sizes can be used to improve the fit within the ear and also improve passive noise isolation.
With certain types of earphones, such as loose fitting earbuds, as well telephone handsets, there is significant acoustic leakage between the atmosphere or ambient environment and the user's ear canal, past the external surfaces of the earphone or handset housing and into the ear. This acoustic leakage could be due to the loose fitting nature of the earbud housing, which promotes comfort for the user. However, the additional acoustic leakage does not allow for enough passive attenuation of the ambient noise at the user's eardrum. The resulting poor passive acoustic attenuation can lead to lower quality user experience of the desired user audio content, either due to low signal-to-noise ratio or speech intelligibility especially in environments with high ambient or background noise levels. In such a case, an ANC mechanism may be effective to reduce the background noise and thereby improve the user's experience.
ANC is a technique that aims to “cancel” unwanted noise, by introducing an additional, electronically controlled sound field referred to as anti-noise. The anti-noise is electronically designed so as to have the proper pressure amplitude and phase that destructively interferes with the unwanted noise or disturbance. An error sensor (typically an acoustic error microphone) is provided in the earphone housing to detect the so-called residual or error noise. The output of the error microphone is used by a control system to adjust how the anti-noise is produced, so as to reduce the ambient noise that is being heard by the wearer of the earphone. In some cases, there is also a reference microphone that is positioned some distance away from the error microphone, and whose signal is used by certain ANC algorithms. The ANC controller operates while the user is, for example, listening to a digital music file that is stored in a local audio source device, or while the user is conducting a conversation with a far-end user of a communications network in an audio or video phone call, or during another audio application that may be running in the audio source device. The ANC controller implements digital signal processing operations upon the microphone signals so as to produce an anti-noise signal, where the anti-noise signal is then converted into sound by the speaker driver system.
The implementation of an adaptive ANC system can benefit from a mechanism that automatically detects near-end speech (or close-talk), which is the situation in which the user of the personal listening device is talking, for example during a phone call. Due to the proximity of the various microphones (used by the ANC system in a personal listening device) to the user's mouth, the near-end speech can be picked up by for example both the reference and error microphones. This speech signal, which appears in the outputs of the reference and error microphones, has been found to act as a disturbance to the adaptive filter algorithms running in the ANC system. The disturbance can cause the divergence of the algorithms which are adapting one or more adaptive filters, namely a control filter (e.g., W(z), or G(z)) and in some cases a so-called S_hat(z) filter. A close-talk detector may automatically detect such a speech signal and in response help prevent the digital filter control signals, which serve to adjust their adaptive filters, from being corrupted, thereby reducing the risk of the adaptive filters diverging. For example, upon detecting speech using a signal from a vibration sensor that is inside the personal listening device, in combination with one or more of the microphone signals, the detector may assert a signal that slows down, or even freezes or halts, the adaptation of one or more of the adaptive filters in the ANC system. The signal may be de-asserted when no close-talk is being detected, thereby allowing the adaptive ANC processes to resume their normal updating of their adaptive filters. The close-talk detector may continuously operate in this manner during for example a phone call, as the near-end user talks and then pauses and then resumes talking to a far-end user.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one. Also, in the interest of conciseness, a single figure is sometimes used to illustrate multiple embodiments of the invention; in that case, it may be that some of the elements shown in the figure are not necessary to certain embodiments.
Several embodiments of the invention with reference to the appended drawings are now explained. Whenever the shapes, relative positions and other aspects of the parts described in the embodiments are not clearly defined, the scope of the invention is not limited only to the parts shown, which are meant merely for the purpose of illustration. Also, while numerous details are set forth, it is understood that some embodiments of the invention may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
In addition, the housing contains a vibration sensor that may be rigidly mounted to the housing so as to perform non-acoustic pick up of the user's voice, such as through bone conduction. Examples of the vibration sensor include a multi-axis accelerometer, a gyroscopic sensor, and an inertial sensor that can provide output signals (e.g., digital signals) representing vibration pickup due to the user's talking. A close-talk detector uses the vibration sensor and one or more microphone signals, which microphone signals are also being used by an ANC controller, to control different aspects of ANC controller.
Signals from the error microphone 7 and optionally one or more reference microphones are produced in or converted into digital form, for use by the ANC controller. The latter performs digital signal processing operations upon the microphone signals to produce an anti-noise signal, where the anti-noise signal is then converted into sound by the speaker driver system 9 (as shown in
In a feedback type of ANC system, a signal representing the disturbance as picked up by the error microphone 7 is fed to the control filter, which in turn produces the anti-noise. The control filter in that case is sometimes designated G(z). The control filter G(z) may be adapted, or adaptively controlled or varied, so that its output causes a sound field referred to as anti-noise to be produced that destructively interferes with the disturbance (which has arrived at the eardrum through the primary acoustic path. In an ANC system that has a feed forward algorithm, the control filter is sometimes designated W(z). An input signal to the control filter W(z) is derived from the output of a reference microphone (not shown in
In some cases, the frequency response of the overall sound producing system, which includes the electro-acoustic response of the speaker driver system 9 and the physical or acoustic features of the user's ear up to the eardrum, can vary substantially during normal end-user operation, as well as across different users. Thus, it is desirable for improved performance to implement a digital ANC system that has a processor which is programmed with an adaptive filter algorithm, such as the filtered-x least means square algorithm (FXLMS), which programmed processor can be viewed as a means for adapting the programmable digital filter (referred to as the control filter). In such an algorithm, the residual error (as picked up by the error microphone 7) is continually being used to monitor the performance of the ANC system, aiming to reduce the error (and hence the ambient noise that is being heard by the user of the earphone or telephone handset). The reference microphone may also used, to help pick up the ambient noise or disturbance. In such algorithms, adaptive identification of the secondary path S(z) may also be required. Thus, in such cases, there may be two adaptive filter algorithms operating simultaneously for each channel, namely one that adapts the control filter W(z) or G(z) to produce the anti-noise, and another that adapts an estimate of the secondary path, namely a filter S_hat(z). This process takes place while user audio content, e.g. a downlink communications signal, a media playback signal from a locally stored media file or a remotely stored media file that is being streamed, or a training audio signal, is being converted into sound by the speaker driver system 9.
As mentioned above, when an adaptive ANC process operating upon a personal listening device being an earphone or a phone handset, the user speech is often picked-up by the error microphone 7 (and by, if present, a reference microphone). This speech signal disturbs the adaptation of the filters W(z) and SA(z), possibly causing one or both of these adaptive filters to diverge from a solution, or become unstable. In order to prevent the divergence of these adaptive filters during user speech, the close-talk detector (see
In one embodiment, the close talk detector performs a digital signal processing-based cross-correlation function between the vibration sensor signal and at least one or both of the error microphone 7 and reference microphone signals, to thereby create a detection statistic or detection metric. This statistic is then evaluated for declaring a close-talk event. For example, the detection statistic can be computed using the L2 norm of the cross-correlation vector between the vibration sensor and microphone signals. This may be performed using either time domain vectors or frequency bin vectors. The L2 norm of the cross-correlation vector may be normalized by dividing it by a computed energy of the vibration sensor and microphone signals, for the time window (or the frequency bins) for which the cross-correlation is computed. The detection statistic is then compared to a fixed or variable preset threshold, and close-talk is declared if the statistic is greater than the threshold.
In one embodiment, when an initial close-talk event is declared, the declaration may then be held for a predefined minimum period of time (hold interval) during which the adaptation of the filters SA(z) and/or W(z) is slowed down or frozen, regardless of having detected during the hold interval that user speech has stopped. When the hold interval then expires, and a subsequent instance of computing the detection statistic is found to be lower than a fixed or variable preset threshold (which may be the same or different than the threshold that was used for declaring the close-talk event), then the close talk event is declared to be over.
The adaptation may be slowed down by for example reducing the step size parameter of a gradient descent-type adaptive filter algorithm. This may be done while maintaining the same sampling rate for the digital microphone signal, and perhaps also for the vibration sensor signal. Alternatively, or in addition, the update interval for actually updating the coefficients of the adaptive filter can be changed, for example from 20 microseconds to several milliseconds. Of course, the adaptation may be frozen in that the coefficients of the digital adaptive filters are kept essentially unchanged upon the occurrence of the close talk event and then are only allowed to be updated once the close talk event is determined to be over. In one embodiment, the adaptive filter algorithm may be allowed to continue to run during a holding interval, immediately following the declaration of a close talk event, i.e. the controller continues to produce new coefficient lists, though the adaptive filter is not actually being updated with the new coefficients.
Referring now to
In the case of a feed forward algorithm such as the one shown in
The close-talk detector described above may also be designed to detect when the close-talk event should be ended, i.e. a condition where the user of the personal listening device has stopped talking. The same digital signals from the vibration sensor and the one or more microphone signals that were used to detect the close talk condition can also be used here to detect when the user speech pauses. In one embodiment, the same statistic that was used for declaring a close-talk event can be recomputed and compared to a threshold (which may be different than the threshold used for declaring the close-talk event, such as when applying hysteresis in transitioning between declaring a close-talk event and declaring the close-talk is over). Movement of the statistic in the opposite direction in this case (relative to the threshold) means that the detector will signal an end to the close-talk event, where insufficient user speech is being detected (that is, a level which is expected to be insufficient to disturb the normal adaption process for the control filter, and, optionally, the adaption process for the S_hat filter). In one embodiment, while the ANC process is active but is updating its adaptive control filter slowly or has frozen the updating, the ANC controller responds to the ending of a close talk event by speeding up or unfreezing its continuing adaptation of the control filter.
As described above, an embodiment of the invention may be implemented as a machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform the digital signal processing operations described above upon the vibration sensor signal and the microphone signals, including conversion from discrete time domain to frequency domain, cross correlation and L2 norm calculations, and comparisons and decision making, for example. In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks). Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
While certain embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that the invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. For example, although some numerical values may have been given above, these are only examples used to illustrate some practical instances; they should be not used to limit the scope of the invention. In addition, other cross correlation techniques for computing the detection statistic may be used. The description here in general is to be regarded as illustrative instead of limiting.
This non-provisional application claims the benefit of the earlier filing date of provisional application No. 61/937,919 filed Feb. 10, 2014.
Number | Date | Country | |
---|---|---|---|
61937919 | Feb 2014 | US |