With the advancement of technology, the use and popularity of electronic devices has increased considerably. Electronic devices may be connected to headphones that generate output audio. Disclosed herein are technical solutions to improve output audio generated by headphones while reducing acoustic feedback.
For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
Some electronic devices may include an audio-based input/output interface. A user may interact with such a device—which may be, for example, a smartphone, tablet, computer, or other speech-controlled device—partially or exclusively using his or her voice and ears. Exemplary interactions include listening to music or other audio, communications such as telephone calls, audio messaging, and video messaging, and/or audio input for search queries, weather forecast requests, navigation requests, or other such interactions. The device may include one or more microphones for capturing voice input and hardware and/or software for converting the voice input into audio data. As explained in greater detail below, the device may further include hardware and/or software for analyzing the audio data and determining commands and requests therein and/or may send the audio data to a remote device for such analysis. The device may include an audio output device, such as a speaker, for outputting audio that in some embodiments responds to and/or prompts for the voice input.
For a variety of reasons, a user may prefer to connect headphones to the device to generate output audio. Headphones may also be used by a user to interact with a variety of other devices. As the term is used herein, “headphones” may refer to any wearable audio input/output device and includes headsets, earphones, earbuds, or any similar device. For added convenience, the user may choose to use wireless headphones, which communicate with the device—and optionally each other—via a wireless connection, such as Bluetooth, Wi-Fi, near-field magnetic induction (NFMI), Long-Term Evolution (LTE), 5G, or any other type of wireless connection.
In certain configurations headphones may deliberately isolate a user's ear (or ears) from an external environment. Such isolation may include, but is not limited to, earbuds which sit at least partially within a user's ear canal, potentially creating a seal between the earbud device and the user's ear which effectively block the inner portions of the ear canal from the external environment. Such isolation may also include providing earcups that envelope a user's ear, blocking the ear off from the external environment. Such isolation results in a significant physical separation from the ear to one or more external noise sources and may provide certain benefits, such as improving an ability to shield the user from external noises and effectively improve the quality of the audio being output by the headphone, earbud, or the like. Such isolation may assist in improving the performance of active noise cancellation (ANC) or other cancellation/noise reduction technology, whose purpose is to reduce the amount of external noise that is detectable by a user.
To further reduce an amount of external noise that is detectable by the user, devices, systems and methods are disclosed that offer a wearable audio output device (e.g., headphones, earphones, and/or the like) configured to perform adaptive ANC processing. Specifically, the wearable audio output device may adaptively determine a feed-forward ANC filter by maximizing a ratio of A:B, where A corresponds to a microphone-ear coherence and B corresponds to a microphone-microphone coherence between feed-forward microphones. By maximizing this ratio, the wearable audio output device may determine weighted gain values used to combine the feed-forward microphone signals. In addition, the wearable audio output device may (i) apply a fixed feed-forward ANC filter profile selected based on a geometry of the wearable audio output device and a generalized ear response, (ii) monitor a secondary path to select from a plurality of feed-forward ANC filter profiles based on an individual user's ear response, or (iii) adaptively update the feed-forward ANC filter based on a feedback microphone signal.
In some examples, the primary and secondary earbuds may include similar hardware and software; in other instances, the secondary earbud contains only a subset of the hardware/software included in the primary earbud. If the primary and secondary earbuds include similar hardware and software, they may trade the roles of primary and secondary prior to or during operation. In the present disclosure, the primary earbud may be referred to as the “first device,” the secondary earbud may be referred to as the “second device,” and the smartphone or other device may be referred to as the “third device.”
As illustrated in
The present disclosure may refer to particular Bluetooth protocols, such as classic Bluetooth, Bluetooth Low Energy (“BLE” or “LE”), Bluetooth Basic Rate (“BR”), Bluetooth Enhanced Data Rate (“EDR”), synchronous connection-oriented (“SCO”), and/or enhanced SCO (“eSCO”), but the present disclosure is not limited to any particular Bluetooth or other protocol. In some embodiments, however, a first wireless connection 124a between the first device 110a and the second device 110b is a low-power connection such as BLE; the second wireless connection 124b may include a high-bandwidth connection such as EDR in addition to or instead of a BLE connection.
In addition, the first, second, and/or third devices may communicate with one or more supporting device(s) 120, which may be server devices, via a network 199, which may be the Internet, a wide- or local-area network, or any other network. The first device 110a may output first output audio 15a, and the second device 110b may output second output audio 15b. The first device 110a and second device 110b may capture input audio 11 from a user 5, process the input audio 11, and/or send the input audio 11 and/or processed input audio to the third device 122 and/or the supporting device(s) 120, as described in greater detail below.
In the example illustrated in
In some examples, the first device 110a may be configured to perform active noise cancellation (ANC) processing to reduce an amount of ambient noise perceived by the user 5. For example, the device 110 may include one or more feed-forward microphones and/or one or more feedback microphones that enable the first device 110a to perform feed-forward ANC processing, feedback ANC processing, and/or hybrid ANC processing. Such ANC (or other cancellation/noise reduction operations) may be manually activated (and deactivated) by a user controlling the headphones (or a connected device) and/or may be automatically activated by the headphones (or a connected device) depending on system configuration. To illustrate an example, the first device 110a may perform ANC processing to reduce the user's perception of a noise source in an environment of the first device 110a. In some examples, the ANC processing may detect ambient noise generated by the noise source and may cancel at least a portion of the ambient noise (e.g., reduce a volume of the ambient noise). For example, the ANC processing may identify the ambient noise and generate a signal that mirrors the ambient noise with a phase mismatch, which cancels/reduces the ambient noise due to destructive interference.
As illustrated in
The first device 110a may determine (132) power spectral density (PSD) estimates (e.g., Sm
The first device 110a may determine (134) cross-PSD estimates for each pair of microphone signals, which will be described in greater detail below with regard to
Using the cross-PSD estimates, the first device 110a may solve (136) an optimization problem to determine weighted gain values for the feed-forward microphones 112b/112c, as will be described in greater detail below with regard to
After determining the weighted gain values, the first device 110a may generate (138) first audio data using the feed-forward microphone signals and the weighted gain values. For example, the first device 110a may generate the first audio data by combining a first product of the second microphone signal and the first weighted gain values with a second product of the third microphone signal and the second weighted gain values. However, the disclosure is not limited thereto and in some examples the first device 110a may generate the first audio data using additional feed-forward microphone signals without departing from the disclosure.
After generating the first audio data (e.g., single-channel feed-forward microphone signal), the first device 110a may determine (140) a feed-forward ANC filter profile, as will be described in greater detail below with regard to
In other examples, the first device 110a may select from a plurality of feed-forward ANC filter profiles based on an individual user's ear response. For example, the first device 110a may estimate a transfer function between a driver and the first microphone 112a (e.g., internal microphone) and can linearly map a magnitude of the transfer function at discrete frequencies to an optimum feed-forward ANC filter profile. Thus, the plurality of feed-forward ANC filter profiles may be pre-computed and stored on the first device 110a and the first device 110a may select from the plurality of feed-forward ANC filter profiles based on the magnitude values at the discrete frequencies. Additionally or alternatively, the feed-forward ANC filter may be an adaptive filter and the first device 110a may adaptively update (e.g., perform adaptation on) the feed-forward ANC filter without departing from the disclosure. For example, the first device 110a may monitor a secondary path impulse response and calculate the feed-forward ANC filter adaptively based on the feedback microphone signal.
The first device 110a may generate (142) second audio data using the feed-forward ANC processing and the first audio data and may generate (144) playback audio using a loudspeaker and the second audio data. For example, the first device 110a may apply the feed-forward ANC filter to the first audio data to generate the second audio data. In addition, the first device 110a may combine the second audio data with media content audio data representing media content and/or third audio data generated by a feedback ANC filter to generate playback audio data, which the first device 110a may send to the loudspeaker.
As illustrated in
In some examples, however, the first device 110a and the second device 110b may communicate and/or coordinate ANC processing without departing from the disclosure. For example, the first device 110a may control a first adaptation rate and/or first parameters associated with the first ANC processing based on a second adaptation rate and/or second parameters associated with the second ANC processing, such that an amount of active noise cancellation is similar between the first device 110a and the second device 110b. Additionally or alternatively, the first device 110a and/or the second device 110b may send one or more microphone signals, adaptive filter coefficients, noise ANC output signals, and/or the like to the other device without departing from the disclosure. In some examples, the first device 110a may determine the weighted gain values and/or determine the feed-forward ANC filter profile using information received from the second device 110b. In other examples, the first device 110a may compare the weighted gain values, the adaptive filter coefficients, the feed-forward ANC filter profile, and/or the like with similar data generated by the second device 110b without departing from the disclosure.
An audio signal is a representation of sound and an electronic representation of an audio signal may be referred to as audio data, which may be analog and/or digital without departing from the disclosure. For ease of illustration, the disclosure may refer to either audio data (e.g., microphone audio data, input audio data, etc.) or audio signals (e.g., microphone audio signal, input audio signal, etc.) without departing from the disclosure. Additionally or alternatively, portions of a signal may be referenced as a portion of the signal or as a separate signal and/or portions of audio data may be referenced as a portion of the audio data or as separate audio data. For example, a first audio signal may correspond to a first period of time (e.g., 30 seconds) and a portion of the first audio signal corresponding to a second period of time (e.g., 1 second) may be referred to as a first portion of the first audio signal or as a second audio signal without departing from the disclosure. Similarly, first audio data may correspond to the first period of time (e.g., 30 seconds) and a portion of the first audio data corresponding to the second period of time (e.g., 1 second) may be referred to as a first portion of the first audio data or second audio data without departing from the disclosure. Audio signals and audio data may be used interchangeably, as well; a first audio signal may correspond to the first period of time (e.g., 30 seconds) and a portion of the first audio signal corresponding to a second period of time (e.g., 1 second) may be referred to as first audio data without departing from the disclosure.
In some examples, the audio data may correspond to audio signals in a time-domain. However, the disclosure is not limited thereto and the device 110 may convert these signals to a subband-domain or a frequency-domain prior to performing additional processing, such as active noise cancellation (ANC) processing, acoustic feedback cancellation (AFC) processing, acoustic echo cancellation (AEC), adaptive interference cancellation (AIC), noise reduction (NR) processing, and/or the like. For example, the device 110 may convert the time-domain signal to the subband-domain by applying a bandpass filter, a Goertzel filter, and/or other filtering to select a portion of the time-domain signal within a desired frequency range. Additionally or alternatively, the device 110 may convert the time-domain signal to the frequency-domain using a Short-Term Fourier Transform (STFT), a Fast Fourier Transform (FFT), and/or the like without departing from the disclosure.
As used herein, audio signals or audio data (e.g., microphone audio data, or the like) may correspond to a specific range of frequency bands. For example, the audio data may correspond to a human hearing range (e.g., 20 Hz-20 kHz), although the disclosure is not limited thereto.
As used herein, a frequency band (e.g., frequency bin) corresponds to a frequency range having a starting frequency and an ending frequency. Thus, the total frequency range may be divided into a fixed number (e.g., 256, 512, etc.) of frequency ranges, with each frequency range referred to as a frequency band and corresponding to a uniform size. However, the disclosure is not limited thereto and the size of the frequency band may vary without departing from the disclosure.
The device 110 may include multiple microphones 112 configured to capture sound and pass the resulting audio signal created by the sound to a downstream component for further processing. Each individual piece of audio data captured by a microphone may be in a time domain. To isolate audio from a particular direction, the device may compare the audio data (or audio signals related to the audio data, such as audio signals in a sub-band domain) to determine a time difference of detection of a particular segment of audio data. If the audio data for a first microphone includes the segment of audio data earlier in time than the audio data for a second microphone, then the device may determine that the source of the audio that resulted in the segment of audio data may be located closer to the first microphone than to the second microphone (which resulted in the audio being detected by the first microphone before being detected by the second microphone).
Using such direction isolation techniques, a device 110 may isolate directionality of audio sources. For example, a particular direction may be associated with azimuth angles divided into bins (e.g., 0-45 degrees, 46-90 degrees, and so forth). To isolate audio from a particular direction, the device 110 may apply a variety of audio filters to the output of the microphones where certain audio is boosted while other audio is dampened, to create isolated audio corresponding to a particular direction, which may be referred to as a beam. While in some examples the number of beams may correspond to the number of microphones, the disclosure is not limited thereto and the number of beams may be independent of the number of microphones 112. For example, a two-microphone array may be processed to obtain more than two beams, thus using filters and beamforming techniques to isolate audio from more than two directions. Thus, the number of microphones may be more than, less than, or the same as the number of beams. The beamformer unit of the device may have an adaptive beamformer (ABF) unit/fixed beamformer (FBF) unit processing pipeline for each beam, as explained below.
Beamforming systems isolate audio from a particular direction in a multi-directional audio capture system. As the terms are used herein, an azimuth direction refers to a direction in the XY plane with respect to the system, and elevation refers to a direction in the Z plane with respect to the system. One technique for beamforming involves boosting target audio received from a desired azimuth direction and/or elevation while dampening noise audio received from a non-desired azimuth direction and/or non-desired elevation.
The devices 110a/110b may include one or more loudspeaker(s) 114 (e.g., loudspeaker 202a/202b), one or more external microphone(s) 112 (e.g., first microphones 204a/204b and second microphones 205a/205b), and one or more internal microphone(s) 112 (e.g., third microphones 206a/206b). The loudspeaker 114 may be any type of loudspeaker, such as an electrodynamic speaker, electrostatic speaker, diaphragm speaker, or piezoelectric loudspeaker; the microphones 112 may be any type of microphones, such as piezoelectric or MEMS microphones. Each device 110a/110b may include one or more microphones 112.
As illustrated in
One or more batteries 207a/207b may be used to supply power to the devices 110a/110b. One or more antennas 210a/210b may be used to transmit and/or receive wireless signals over the first connection 124a and/or second connection 124b; an I/O interface 212a/212b contains software and hardware to control the antennas 210a/210b and transmit signals to and from other components. A processor 214a/214b may be used to execute instructions in a memory 216a/216b; the memory 216a/216b may include volatile memory (e.g., random-access memory) and/or non-volatile memory or storage (e.g., flash memory). One or more sensors 218a/218b, such as accelerometers, gyroscopes, or any other such sensor may be used to sense physical properties related to the devices 110a/110b, such as orientation; this orientation may be used to determine whether either or both of the devices 110a/110b are currently disposed in an ear of the user (i.e., the “in-ear” status of each device).
As illustrated in
In the example illustrated in
The device 110 may perform ANC processing 500 using feed-forward ANC processing, feedback ANC processing, hybrid ANC processing, and/or a combination thereof. To illustrate an example of feed-forward ANC processing, the device 110 may capture the ambient noise as first audio data using the feed-forward microphone(s) 520 and may apply a feed-forward filter to the first audio data to estimate the ambient noise signal received by the ear 504. For example, the device 110 may determine a transfer function and/or filters that correspond to a difference between first ambient noise captured by the feed-forward microphone(s) 520 and second ambient noise detected by the ear 504.
In the example illustrated in
To illustrate an example of feedback ANC processing, the device 110 may capture the ambient noise as fourth audio data using a feedback microphone 530, although the disclosure is not limited thereto and the device 110 may include multiple feedback microphones 530 without departing from the disclosure. As the feedback microphone 530 is located in close proximity to the ear 504, the feedback microphone 530 does not need to estimate the ambient noise signal received by the ear 504 as the fourth audio data corresponds to this ambient noise signal. However, unlike the first audio data generated by the feed-forward microphone(s) 520, the fourth audio data generated by the feedback microphone 530 is not limited to the ambient noise. Instead, due to proximity to the ear 504, the fourth audio data includes the ambient noise and a representation of playback audio generated by the driver 570.
In order to perform feedback ANC processing, the device 110 may remove the playback audio recaptured by the feedback microphone 530 (e.g., by performing echo cancellation and/or the like) and generate fifth audio data that corresponds to the ambient noise. In the example illustrated in
As illustrated in
While not illustrated in
In the example illustrated in
The adaptive ANC system 600 is not identical to the ANC processing 500, however, as the adaptive ANC system 600 illustrates additional components that increase a complexity and/or enable the adaptive ANC system 600 to perform adaptive ANC processing. For example, the adaptive ANC system 600 includes a fourth audio path associated with media content, as well as a number of gain components that enable the adaptive ANC system 600 to balance relative gain values and/or control an amount of gain applied in each audio path. Additionally or alternatively, the adaptive ANC system 600 includes two feed-forward microphones 730a/730b and a combiner component 620 configured to generate a combined single-channel feed-forward microphone signal.
As illustrated in
A feed-forward gain component 630 may apply a first gain to the combined feed-forward microphone signal to generate first audio data and output the first audio data to the feed-forward ANC component 540. As described above, a difference between first ambient noise captured by the feed-forward microphones 520a/520b and second ambient noise detected by the ear 504 can be modeled by a noise transfer function (e.g., Noe) between the feed-forward microphones 520a/620b and the ear 504 in diffuse noise. The feed-forward ANC component 540 may approximate the noise transfer function using adaptive filters configured to generate an estimated noise transfer function (e.g., {circumflex over (N)}oe). Thus, the feed-forward ANC component 540 may use the first audio data and the estimated noise transfer function (e.g., {circumflex over (N)}oe) to generate second audio data that estimates the ambient noise signal received by the ear 504. To cancel the second audio data, the device 110 may generate third audio data that mirrors the second audio data but has a phase mismatch that will cancel or reduce the second audio data using destructive interference.
A combiner component 640 may combine the third audio data with an output of a combiner component 690, which will be described below, to generate fourth audio data. The combiner component 640 may output the fourth audio data to a loudspeaker gain component 650, which may be configured to apply a second gain to the fourth audio data to generate loudspeaker audio data 655. The loudspeaker audio data 655 may be sent to the driver 570 and the driver 570 may generate output audio using the loudspeaker audio data 655.
A portion of the output audio may be recaptured by the feedback microphone 530. For example, the feedback microphone 530 may generate third microphone audio data 606 representing the portion of the output audio. The feedback microphone 530 may output the third microphone audio data 606 to a feedback gain component 660 and the feedback gain component 660 may apply a third gain to generate fifth audio data. The feedback gain component 660 may output the fifth audio data to the feedback ANC component 550 and the feedback ANC component 550 may perform feedback ANC filter processing to generate sixth audio data.
As illustrated in
After generating the combined feed-forward microphone signal, the device 110 may determine the feed-forward ANC filter. In some examples, the device 110 may be configured to apply a fixed feed-forward ANC filter without departing from the disclosure. For example, the device 110 may be configured with a predetermined feed-forward ANC filter selected based on a geometry of the device 110 and a generalized ear response determined by measuring a plurality of ear responses.
In other examples, the device 110 may monitor the secondary path and select from a plurality of feed-forward ANC filter profiles based on an individual user's ear response. For example, when the device 110 is generating output audio for the user, the device 110 may estimate a transfer function between the driver 570 and the ear 504 (or feedback microphone 530) and can linearly map a magnitude of the transfer function at discrete frequencies to an optimum feed-forward ANC filter profile. Thus, the plurality of feed-forward ANC filter profiles may be pre-computed and stored on the device 110 and the device 110 may select from the plurality of feed-forward ANC filter profiles based on the magnitude values at the discrete frequencies. Additionally or alternatively, the feed-forward ANC filter may be an adaptive filter and the device 110 may adaptively update the feed-forward ANC filter without departing from the disclosure. For example, the device 110 may monitor a secondary path impulse response and calculate the feed-forward ANC filter adaptively based on the feedback microphone signal (e.g., third microphone audio data 606).
As used herein, an adaptive filter is digital filter that has self-adjusting characteristics, such that the adaptive filter is capable of adjusting filter coefficient values automatically. For example, an adaptive filter may have a transfer function controlled by variable parameters and a means to adjust those parameters according to an optimization procedure or an optimization problem. This may involve the use of a cost function (e.g., loss function), which is a criterion for optimum performance of the adaptive filter, to feed an optimization procedure, which determines how to modify the filter transfer function in order to minimize the cost on the next iteration. In some examples, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and calculating the value of the function. For example, the adaptive filter may perform adaptation and iteratively update the adaptive filter coefficient values in order converge on an optimized solution.
To illustrate an example, a closed loop adaptive filter may use feedback in the form of an error signal to refine its transfer function. For example, the adaptive filter may receive first data as an input and may generate second data using the first data and the adaptive filter coefficient values. The error signal may be generated using the second data and fed back into the adaptive filter, enabling the adaptive filter to perform adaptation and update the adaptive filter coefficient values to maximize or minimize the error signal. For example, the adaptive filter may use the error signal to create updated weights (e.g., adaptive filter coefficients) for the filters, and these updated weights may be used to weight future signals. However, the disclosure is not limited thereto and the adaptive filter may vary without departing from the disclosure.
A speed at which the adaptive filter adapts one weight to an updated weight (e.g., rate of adaptation or adaptation rate) may be a function of a step-size or time constant associated with the adaptive filter. In some examples, the adaptive filter may vary the step-size or time constant in order to modulate the adaptation rate based on system conditions. For example, the adaptive filter may increase the adaptation rate to reduce an amount of time required for the adaptive filter to update the adaptive filter coefficient values, enabling the adaptive filter to converge more quickly. Additionally or alternatively, the adaptive filter may decrease the adaptation rate to increase an amount of time required for the adaptive filter to update the adaptive filter coefficient values, which may improve stability. In some examples, the adaptive filter may cease to update the adaptive filter coefficient values for a duration of time, which may be referred to as freezing adaptation of the adaptive filter. For example, the adaptive filter may freeze adaptation in response to voice activity being detected, wind being detected, and/or the like without departing from the disclosure.
In some examples, the device 110 may convert from the time domain to the subband domain using a Goertzel filter. For example, a Goertzel filter may calculate the complex spectra of each signal, recursively smoothed over successive time frames. The Goertzel filter may be configured to estimate a single desired Discrete Fourier Transform (DFT) frequency bin (e.g., subband), and can be implemented as a first-stage recursive filter followed by a single feed-forward stage. While this is similar to performing a Short Term Fourier Transform (STFT), the Goertzel filter may be computationally efficient. However, the disclosure is not limited thereto and in other examples the device 110 may include an analysis filterbank without departing from the disclosure. For example, an analysis filterbank may include a uniform discrete Fourier transform (DFT) filterbank to convert the microphone signal from the time domain into the subband domain, which may include converting to the frequency domain and then separating different frequency ranges into a plurality of individual subbands.
After converting to the subband domain, the audio signal from the i-th microphone may be represented as Xi(n, k), where i denotes the microphone, n denotes the frame index, and k denotes the sub-band index. Using an i-th microphone signal Xi(n,k), in some examples the device 110 may generate a PSD estimate (e.g., PSD function) using the following equation:
Sx
where Sx
The device 110 may calculate a cross-PSD estimate (e.g., CPSD function) using the following equation:
Sx
where ( )* is complex conjugate, Sx
The primary path transfer function NOI represents a transfer function from an outer microphone (e.g., feed-forward microphone) to an inner microphone (e.g., feedback microphone) in diffuse noise, while the secondary path transfer function HDI represents a transfer function from the driver to the inner microphone (e.g., feedback microphone). For ease of illustration, in some examples a position of the ear canal of the user 5 may be approximated by the location of the feedback microphone. For example, in order to distinguish the feedback microphone from multiple feed-forward microphones located outside the ear canal, some equations may reference the feedback microphone using a symbol (e.g., “e”) associated with the ear without departing from the disclosure.
Assuming a perfect filter, the amount of cancellation will be a function of the magnitude squared coherence between the outer microphone (O) and the inner microphone (I) per frequency, which may be referred to as coherence limit 820:
Coherence Limit=10*log10(1−COI(jw)) [4]
If the device 110 includes multiple feed-forward microphones, the device 110 may assess the coherence limit of each microphone individually. Using the solution to the generalized Rayleigh quotient, the device 110 may predict the coherence limit for combining multiple microphones together as follows. For each frequency, the device 110 may calculate a cross-power spectral density matrix 830 for the inner ear microphone (e) and the feed-forward microphones, where the feed-forward microphones are labeled [m1, m2, . . . mN]. While the following example only illustrates two feed-forward microphones, the disclosure is not limited thereto:
To maximize FF ANC performance, the device 110 may find the optimal weighting (w) of the feed-forward microphones to maximize the magnitude squared coherence between the ear (E) and the sum of microphones (Y). For example, the device 110 may use weighting 840 to solve optimization 850:
The solution to the optimal weights can be solved in the form of a generalized Rayleigh quotient, where the solution is the Eigenvector associated with the maximum eigenvalue.
After using the weighting to combine the feed-forward microphone signals to a single channel, the device 110 may calculate the theoretical best FF ANC by determining a combined coherence limit 870:
Coherence Limit=10*log10(1−Cey(jw)) [8]
Using the DFT audio data 915 generated by the Goertzel filter, an HDI estimation component 920 may estimate the secondary path transfer function HDI (e.g., transfer function from the driver to the inner microphone). As described below with regard to
In some examples, the system 100 may pre-compute optimum ANC filters via data collection measurements and the device 110 may store these optimum ANC filters as a plurality of ANC filter profiles. The number of frequency points fn is tunable, with a tradeoff between complexity and performance.
As illustrated in
In some examples, the device 110 may perform additional steps to ensure that the estimated secondary path transfer function HDI is reliable. For example, the device 110 may determine a magnitude-squared coherence, which measures how similar two signals are and is used as a reliability metric for trusting the estimated transfer function HDI. In addition, the device 110 may include a voice activity detection (VAD) component 930, a wind detection component 940, and a clipping detection component 950. If the device 110 detects voice activity by the user 5, wind activity, and/or clipping (e.g., discrete sound events that exceed a desired range), the HDI estimation component 920 may slow adaptation, freeze adaptation, and/or ignore the estimated transfer function HDI, although the disclosure is not limited thereto.
The system 100 may include one or more controllers/processors 1204 that may each include a central processing unit (CPU) for processing data and computer-readable instructions, and a memory 1206 for storing data and instructions. The memory 1206 may include volatile random-access memory (RAM), non-volatile read only memory (ROM), non-volatile magnetoresistive (MRAM) and/or other types of memory. The system 100 may also include a data storage component 1208, for storing data and controller/processor-executable instructions (e.g., instructions to perform operations discussed herein). The data storage component 1208 may include one or more non-volatile storage types such as magnetic storage, optical storage, solid-state storage, etc. The system 100 may also be connected to removable or external non-volatile memory and/or storage (such as a removable memory card, memory key drive, networked storage, etc.) through the input/output device interfaces 1202.
Computer instructions for operating the system 100 and its various components may be executed by the controller(s)/processor(s) 1204, using the memory 1206 as temporary “working” storage at runtime. The computer instructions may be stored in a non-transitory manner in non-volatile memory 1206, storage 1208, and/or an external device. Alternatively, some or all of the executable instructions may be embedded in hardware or firmware in addition to or instead of software.
The system may include input/output device interfaces 1202. A variety of components may be connected through the input/output device interfaces 1202, such as the loudspeaker(s) 114, the microphone(s) 112, and a media source such as a digital media player (not illustrated). The input/output interfaces 1202 may include A/D converters (not shown) and/or D/A converters (not shown).
The input/output device interfaces 1202 may also include an interface for an external peripheral device connection such as universal serial bus (USB), FireWire, Thunderbolt or other connection protocol. The input/output device interfaces 1202 may also include a connection to one or more networks 199 via an Ethernet port, a wireless local area network (WLAN) (such as WiFi) radio, Bluetooth, and/or wireless network radio, such as a radio capable of communication with a wireless communication network such as a Long Term Evolution (LTE) network, WiMAX network, 3G network, etc. Through the network(s) 199, the system 100 may be distributed across a networked environment.
The concepts disclosed herein may be applied within a number of different devices and computer systems, including, for example, general-purpose computing systems, multimedia set-top boxes, televisions, stereos, radios, server-client computing systems, telephone computing systems, laptop computers, cellular phones, personal digital assistants (PDAs), tablet computers, wearable computing devices (watches, glasses, etc.), other mobile devices, etc.
The above aspects of the present disclosure are meant to be illustrative. They were chosen to explain the principles and application of the disclosure and are not intended to be exhaustive or to limit the disclosure. Many modifications and variations of the disclosed aspects may be apparent to those of skill in the art. Persons having ordinary skill in the field of digital signal processing and echo cancellation should recognize that components and process steps described herein may be interchangeable with other components or steps, or combinations of components or steps, and still achieve the benefits and advantages of the present disclosure. Moreover, it should be apparent to one skilled in the art, that the disclosure may be practiced without some or all of the specific details and steps disclosed herein.
Aspects of the disclosed system may be implemented as a computer method or as an article of manufacture such as a memory device or non-transitory computer readable storage medium. The computer readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure. The computer readable storage medium may be implemented by a volatile computer memory, non-volatile computer memory, hard drive, solid-state memory, flash drive, removable disk and/or other media. In addition, components of system may be implemented in firmware and/or hardware, such as an acoustic front end (AFE), which comprises, among other things, analog and/or digital filters (e.g., filters configured as firmware to a digital signal processor (DSP)).
Conditional language used herein, such as, “can,” “could,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements, and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without other input or prompting, whether these features, elements, and/or steps are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
Disjunctive language such as the phrase “at least one of X, Y, Z,” unless specifically stated otherwise, is understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. As used in this disclosure, the term “a” or “one” may include one or more items unless specifically stated otherwise. Further, the phrase “based on” is intended to mean “based at least in part on” unless specifically stated otherwise.
Number | Name | Date | Kind |
---|---|---|---|
20130108068 | Poulsen | May 2013 | A1 |
20140086425 | Jensen | Mar 2014 | A1 |
20170270906 | Kwatra | Sep 2017 | A1 |
20200045403 | Ganeshkumar | Feb 2020 | A1 |
20220210552 | Wang | Jun 2022 | A1 |
20220343886 | McCutcheon | Oct 2022 | A1 |
Number | Date | Country |
---|---|---|
115103258 | Sep 2022 | CN |
2582372 | Sep 2020 | GB |