Wireless audio devices, such as earbuds or headphones, may be used to communicate wirelessly with a user device, such as a smartphone, smartwatch, or similar device, and with each other. The wireless earbuds may be used to output audio sent from the user device, such as music, as part of two-way communications, such as telephone calls, and/or to receive audio for speech recognition.
For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings.
Some electronic devices may include an audio-based input/output interface. A user may interact with such a device—which may be, for example, a smartphone, smart speaker, tablet, computer, or other speech-controlled device—partially or exclusively using his or her voice and ears. Exemplary interactions include listening to music or other audio, communications such as telephone calls, audio messaging, and video messaging, and/or audio input for search queries, weather forecast requests, navigation requests, or other such interactions. The device may include one or more microphones for capturing voice input and hardware and/or software for converting the voice input into audio data. The device may include an audio output device, such as a loudspeaker, for outputting audio that in some embodiments responds to and/or prompts for the voice input.
Use of the above-described electronic device by its audio-based input/output interface may, at times, be inconvenient, difficult, or impossible. Sometimes, such as while exercising, working, or driving, the user's hands may be occupied, and the user may not be able to hold the device in such a fashion as to effectively interact with the device's speech interface. Other times, the level of ambient noise may be too high for the device to accurately detect speech from the user or too high for the user to understand audio output from the device. In these situations, the user may prefer to connect headphones to the device and interact with the audio-based input/output interface therewith. As the term is used herein, “headphones” may refer to any hands-free, wearable audio input/output device and includes headsets, earphones, earbuds, or any similar device. For added convenience, the user may choose wireless headphones, which communicate with the device—and optionally each other—via a wireless connection, such as Bluetooth, WI-FI, near-field magnetic induction (NFMI), cellular long-term extension (LTE), or any other type of wireless connection. Wireless earbuds may be more desirable and/or convenient to users because the earbuds do not require a wire or cord connecting them; such a cord may be distracting and/or uncomfortable.
In the present disclosure, for clarity, headphone components that are capable of communication with both a third device (such as a phone, tablet, etc.) and each other are referred to as “wireless earbuds,” but the term “earbud” does not limit the present disclosure to any particular type of wired or wireless headphones. The present disclosure may further differentiate between a “right earbud,” meaning a headphone component disposed in or near a right ear of a user, and a “left earbud,” meaning a headphone component disposed in or near a left ear of a user. A “primary” earbud communicates with both a “secondary” earbud, using a first wireless connection (such as a Bluetooth or NFMI connection); the primary earbud further communicates with a third device (such as a smartphone, smart watch, or similar device) using a second connection (such as a Bluetooth connection). The secondary earbud communicates directly with only with the primary earbud and does not communicate using a dedicated connection directly with the smartphone; communication therewith may pass through the primary earbud via the first wireless connection.
The primary and secondary earbuds may include similar hardware and software; in other instances, the secondary earbud contains different hardware/software included in the primary earbud. If the primary and secondary earbuds include similar hardware and software, they may trade the roles of primary and secondary prior to or during operation. In the present disclosure, the primary earbud may be referred to as the “first device,” the secondary earbud may be referred to as the “second device,” and the smartphone or other device may be referred to as the “third device.” The first, second, and/or third devices may communicate over a network, such as the Internet, with one or more server devices, which may be referred to as “remote device(s).”
Each of the primary and secondary earbuds may also include a loudspeaker; the loudspeaker may include a single audio-output device or a plurality of audio-output device. As the term is used herein, a loudspeaker refers to any audio-output device; in a system of multiple audio-output devices, however, the system as a whole may be referred to as a loudspeaker while the plurality of audio-output devices therein may each be referred to as a “driver.”
Each driver may output different ranges of frequencies of sound. For example, a first, low-frequency driver may output sound having frequencies mostly below a cutoff frequency, and a second, high-frequency driver may output sound having frequencies mostly above a cutoff frequency. A type of filter called a “crossover filter” may be used to divide audio data into separate frequency ranges, and each frequency range may be sent to a different driver. A high-frequency driver may be referred to as a “tweeter,” and a low-frequency driver may be referred to as a “woofer” or “subwoofer.” The present disclosure is not, however, limited to any particular number or type of drivers.
In some embodiments, a loudspeaker including a single audio-output device may include a “dynamic driver” as the audio-output device. A loudspeaker including multiple audio-output device may include “balanced-armature drivers” as each audio-output device. The present disclosure is not limited, however, to any particular type or combinations of audio-output devices.
A balanced-armature driver may include a coil of electric wire wrapped around an armature; the coil is disposed between two magnets, and changes in the current in the coil causes attraction and/or repulsion between it and the magnets, thereby creating sound using variations in the current. A balanced armature driver may be referred to as “balanced” because there may be no net force on the armature when it is centered in the magnetic field generated by the magnets and when the current is not being varied.
A dynamic driver may include a diaphragm attached to a voice coil. When a current is applied to the voice coil, the voice coil moves between two magnets, thereby causing the diaphragm to move and produce sound. Dynamic drivers may thus be also known as “moving-coil drivers.” Dynamic drivers may have a greater frequency range of output sound when compared to balanced-armature drivers but may be larger and/or more costly.
Active-noise cancellation (ANC), also referred to as active-noise control, refers to systems and methods for reducing or cancelling unwanted ambient sound or “noise” by producing a waveform, referred to herein as “anti-noise” or “cancellation data,” having an opposite or negative amplitude—but similar absolute value—compared to the noise. For example, if a noise signal corresponds to sin Θ, the anti-noise signal corresponds to −sin Θ. The anti-noise is output such that it cancels the noise at a point of interest, such as a point at or near where an ear of a user is disposed. The anti-noise may instead or in addition be combined with audio output or playback, such as audio output corresponding to music or voice, such that when the audio output collides with the noise, the noise is cancelled from the audio output. ANC may not function adequately when using some loudspeakers, such as balanced-armature drivers, however, and undesired noise may not be cancelled at least because latency associated with the ANC is too high and/or because high-frequency noise is not adequately cancelled.
The present disclosure offers a system and method for cancelling noise in a multi-driver system. An error microphone receives audio from a first, low-frequency driver and a second, high-frequency driver; the error microphone further receives noise audio. The path between the drivers and the microphone may be referred to herein as the “secondary path,” and may be denoted by the transfer function S(z). An adaptive controller may be used to configure an estimation Ŝ(z) of the secondary path S(z) by configuring a filter, such as a finite-impulse response (FIR) or infinite-impulse response (IIR) filter, such that an error between the noise and generated anti-noise is minimized. The generated anti-noise or cancellation data may be generated by applying playback audio data to an input of the estimation Ŝ(z) of the secondary path S(z) and subtracting an output of the estimation Ŝ(z) of the secondary path S(z) from audio data received by the error microphone. The anti-noise or cancellation data may be modified by a feedback controller; the anti-noise and/or modified anti-noise may then be used to modify playback audio data before it is sent to the low-frequency driver for output. The playback audio is sent directly to the high-frequency driver without modification using the anti-noise.
In various embodiments, the first and/or second devices 110a/110b output (130), using a first, low-frequency driver, first audio and output (132) using a second, high-frequency driver, second audio corresponding to first output audio data. The first and/or second devices 110a/110b receive (134), from a microphone, input audio data corresponding to a representation of the first audio, a representation of the second audio, and a representation of noise audio. As explained further herein, the first audio and/or second audio may be modified by a transfer function S(z) corresponding to a cavity extending between the drivers and microphone; the amount and type of modification may be defined by, for example, the size and shape of the cavity disposed between the drivers and the microphone, the type of drivers and microphone, the material defining the walls of the cavity, or any other such attribute of the transfer function S(z). The transfer function S(z) may also be referred to herein as the “secondary path.”
The first and/or second devices 110a/110b determine (136) a transfer function Ŝ(z) corresponding to an estimation of the transfer function S(z) and generate (138), based at least in part on the determined transfer function Ŝ(z), estimated audio data. As explained further herein, the estimated audio data may be produced by the transfer function Ŝ(z) generated by, for example, an FIR filter that may be updated by an adaptive controller, which may use, for example, a least-means-square algorithm to determine an error between noise and anti-noise and/or between the estimation and the output audio data. The adaptive controller may update the FIR filter by changing one or more parameters/coefficients of the filter in accordance with the error. For example, if a gradient associated with the error is positive, one or more coefficients of the filter may be decreased, while if the gradient is negative, one or more coefficients of the filter may be decrease. The magnitude of the change to the one or more coefficients may be determined by the magnitude of the gradient.
The first and/or second devices 110a/110b generate (140) cancellation audio data, which may correspond to anti-noise, by subtracting the estimated audio data from the input audio data. A feedback controller generates (144) feedback audio data from the cancellation audio data. The first and/or second devices 110a/110b receive (146) second output audio data; the first and/or second devices 110a/110b output (148), using the low-frequency driver, the feedback audio data audio data subtracted from the second output audio data, and output (50), using the high-frequency driver, the second output audio data.
The devices 110a/110b may each include a loudspeaker 202a/202b. The loudspeaker 202a/202b may be any type of loudspeaker, such as an electrodynamic loudspeaker, electrostatic loudspeaker, dynamic loudspeaker, diaphragm loudspeaker, or piezoelectric loudspeaker. The loudspeaker 202a/202b may further include one or more drivers, such as balanced-armature drivers. The present disclosure is not limited to any particular type of loudspeaker 202a/202b or driver.
The devices 110a/110b may further each include one or more microphones, such as external microphones 204a/204b and/or internal microphones 205a/205b. The microphones 204a/204b and 205a/205b may be any type of microphone, such as a piezoelectric or MEMS microphone. The loudspeakers 202a/202b and microphones 204a/204b and 205a/205b may be mounted on, disposed on, or otherwise connected to frame elements 206a/206b. The devices 110a/110b may each further include inner-lobe inserts 208a/208b that may bring the loudspeakers 202a/202b closer to the eardrum of the user and/or block some ambient noise. The internal microphones 205a/205b may be disposed in or on the inner-lobe inserts 208a208b or in or on the loudspeakers 202a/202b. The external microphones 204a/204b may be disposed in or on the frame elements 206a/206b.
One or more additional components may be disposed in or on the frame elements 206a/206b. One or more antennas 210a/210b may be used to transmit and/or receive wireless signals over the first connection 114a and/or second connection 114b; an I/O interface 212a/212b contains software and hardware to control the antennas 210a/210b and transmit signals to and from other components. A processor 214a/214b may be used to execute instructions in a memory 216a/216b; the memory 216a/216b may include volatile memory (e.g., random-access memory) and/or non-volatile memory or storage (e.g., flash memory). One or more sensors 218a/218b, such as accelerometers, gyroscopes, or any other such sensor may be used to sense physical properties related to the devices 110a/110b, such as orientation; this orientation may be used to determine whether either or both of the devices 110a/110b are currently disposed in an ear of the user (i.e., the “in-ear” status of each device).
In various embodiments, the anti-noise data may be generated by modifying the playback audio data 502 with the noise audio 516. The modification may include subtracting an estimation of the noise audio 516 from the playback audio data 502 or adding an estimation of anti-noise audio corresponding to the noise audio EEX16 to the playback audio data 502. The present disclosure is not limited to any particular method for modifying the playback audio data 502 with the noise audio 516.
In some embodiments, an error microphone 518 captures the high-frequency audio 506, the low-frequency audio 514, and the noise audio 516 to create captured audio data 520. The devices 110a/110b may directly subtract the captured audio data 520 directly from the playback audio data 502 to thereby cancel some or all of the noise audio 516. This direct subtraction, however, also subtracts some or all of the captured high-frequency audio 506 and low-frequency audio 514, thereby distorting playback of the playback audio data 502. In some embodiments, some or all of the playback audio data 502 may be amplified before the devices 110a/110b send the amplified playback audio data 502 for output by the drivers 504/506 to mitigate this distortion.
In various embodiments of the present disclosure, a transfer function Ŝ(z) 522 corresponding to the transfer function S(z) 534 is configured to generate estimated audio data 524; this estimated audio data 524 may correspond to the playback audio data 502 as it is received by the error microphone 518. In other words, the estimated audio data 524 corresponds to the playback audio data 502 as modified by the transfer function Ŝ(z) 522—but not including the noise data 516. A second summing component 526 may then be used to subtract the estimated audio data 524 from the captured audio data 520 to thereby create estimated anti-noise data 528. The first summing component 510 may then subtract the estimated anti-noise data 528 from the playback audio data 502 for output by the first, low-frequency driver 508. In some embodiments, a feedback controller 530 may be used to modify the estimated anti-noise data 528 to create modified estimated anti-noise data 532 prior to sending to the first summing component 510. The feedback controller 530 may, for example, delay or change the phase of the estimated anti-noise data 528 using, for example, one or more additional FIR filters. In some embodiments, the feedback controller 530 includes one or more crossover filters for isolating low-frequency sounds from the estimated anti-noise data 528.
In various embodiments, the transfer function Ŝ(z) 522 includes one or more FIR filters. An FIR filter may modify audio data by applying one or more coefficients—which may also be referred to herein as “parameters,” “variables,” or “taps”—to one or more samples of audio data; the number of coefficients may be represented by a length N of the filter. In other words, an FIR filter outputs a series of weighted averages of its N most recent input samples. By updating the coefficients, the adaptive controller may configure the FIR filter to be a low-pass filter, a band-pass filter, a high-pass filter, or may shape the input samples in any combination thereof or in accordance with any transfer function. The longer the filter, the more complicated and/or precise the transfer function may be. In some embodiments, the transfer function Ŝ(z) 522 may be configured to change the delay, but not the phase, of the playback audio data 502. This delay may correspond to a delay associated with the transfer function S(z) 534.
An adaptive controller 532 may be used to change or otherwise update the coefficients of the transfer function Ŝ(z) 522 such that the transfer function Ŝ(z) 522 corresponds to the characteristics of the transfer function S(z) 534. In some embodiments, the adaptive controller 532 minimizes a difference between the noise audio 516 and the estimated anti-noise data 528 or minimize a difference between the estimated audio data 524 and the playback audio data 502. In some embodiments, the adaptive controller 532 utilizes a least-means-squares algorithm to compute this error.
In some embodiments, default coefficients for the FIR filter may be stored in a computer memory. The first and/or second devices 110a/110b may load these default coefficients into the FIR filter before operating the adaptive controller 532 to dynamically update the FIR filter. This load may take place, for example, upon power-on, upon wake from sleep, periodically, or at any other time. These default coefficients may be determined by, for example, experimentation in a lab with simulated data, with real-world data, or with both. The default coefficients may be selected to represent a typical user and typical use cases. In some embodiments, two or more sets of default coefficients may be determined to represent two or more use cases; the use cases may correspond to different anatomies of users, different environmental conditions, or other such variables. The first and/or second devices 110a/110b may load these sets of coefficients and select a set based on having a minimum error, as defined above.
In some embodiments, minimum and maximum values for the coefficients may be defined by experimentation, simulation, or other testing. These minimum and maximum values may correspond to the limits at which the FIR filter should operate. If the adaptive controller 532 attempts to change the coefficients outside of these limits, it may cap or peg the coefficients at the limits.
In some embodiments, one of the first and/or second devices 110a/110b may be unable to determine satisfactory coefficients. The first and/or second device 110a/110b may determine, for example, that the error is greater than a threshold for a certain period of time. In these embodiments, one of the first and/or second devices 110a/110b may send a request to the other for its coefficients, and the recipient of the request may determine and transmit its coefficients in response. Similarly, if one of the first and/or second devices 110a/110b determines that the error is greater than a threshold for a certain period of time, it may permanently or temporarily reduce the volume of the audio data being sent to the second, high-frequency driver to thereby reduce the amount of audio data output by the loudspeaker that is not being cancelled.
Computer instructions for operating each device 110a/110b and its various components may be executed by the respective device's controller(s)/processor(s) 214, using the memory 216 as temporary “working” storage at runtime. A device's computer instructions may be stored in a non-transitory manner in non-volatile memory 216, storage 608, or an external device(s). Alternatively, some or all of the executable instructions may be embedded in hardware or firmware on the respective device in addition to or instead of software.
Each device 110a/110b includes input/output device interfaces 212. A variety of components may be connected through the input/output device interfaces, as will be discussed further below. Additionally, each device 110a/110b may include an address/data bus 624 for conveying data among components of the respective device. Each component within a device 110a/110b may also be directly connected to other components in addition to (or instead of) being connected to other components across the bus 624.
For example, via the antenna 210, the input/output device interfaces 212 may connect to one or more networks 199 via a wireless local area network (WLAN) (such as Wi-Fi) radio, Bluetooth, and/or wireless network radio, such as a radio capable of communication with a wireless communication network such as a Long Term Evolution (LTE) network, WiMAX network, 3G network, etc. A wired connection such as Ethernet may also be supported. Through the network(s) 199, the speech processing system may be distributed across a networked environment.
Referring to the device 110a/110b/112 of
As illustrated in
The above aspects of the present disclosure are meant to be illustrative and were chosen to explain the principles and application of the disclosure; they are not intended to be exhaustive or to limit the disclosure. Many modifications and variations of the disclosed aspects may be apparent to those of skill in the art. Persons having ordinary skill in the field of computers, wearable devices, and speech processing will recognize that components and process steps described herein may be interchangeable with other components or steps, or combinations thereof, and still achieve the benefits and advantages of the present disclosure. Moreover, it will be apparent to one skilled in the art that the disclosure may be practiced without some or all of the specific details and steps disclosed herein. As the term is used herein, “component” may be interchanged with similar terms, such as “module” or “engine.”
Aspects of the disclosed system may be implemented as a computer method or as an article of manufacture, such as a memory device or non-transitory computer readable storage medium. The computer-readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure. The computer-readable storage medium may be implemented by a volatile computer memory, non-volatile computer memory, hard drive, solid-state memory, flash drive, removable disk, and/or other media. In addition, components of system may be implemented in firmware and/or hardware, such as an acoustic front end (AFE), which comprises, among other things, analog and/or digital filters (e.g., filters configured as firmware to a digital signal processor (DSP)).
Conditional language used herein, such as, “can,” “could,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements, and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without other input or prompting, whether these features, elements, and/or steps are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
Disjunctive language such as the phrase “at least one of X, Y, Z,” unless specifically stated otherwise, is understood with the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. As used in this disclosure, the term “a” or “one” may include one or more items unless specifically stated otherwise. Further, the phrase “based on” is intended to mean “based at least in part on” unless specifically stated otherwise.
This application is a non-provisional of, and claims the benefit of priority of, U.S. Provisional Patent Application No. 62/665,288, filed May 1, 2018, and entitled “USING BANDWIDTH-LIMITED AUDIO DEVICES,” in the name of Ali Abdollahzadeh Milani, which is hereby incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
8804974 | Melanson | Aug 2014 | B1 |
20060251266 | Saunders | Nov 2006 | A1 |
Number | Date | Country | |
---|---|---|---|
62665288 | May 2018 | US |