This disclosure relates generally to in-ear audio devices.
Headphones are a pair of loudspeakers worn on or around a user's ears. Circumaural headphones use a band on the top of the user's head to hold the speakers in place over or in the user's ears. Another type of headphone is known as an earbud or earpiece, and includes units that are worn within the pinna of the user's ear, close to the user's ear canal.
Both headphones and ear buds are becoming more common with increased use of personal electronic devices. For example, people use headphones to connect to their phones to play music, listen to podcasts, etc. As another example, people who experience hearing loss also use ear-mounted devices to amplify environmental sounds. However, headphone devices are currently not designed for all-day wear since their presence blocks outside noise from entering the ear. Thus, the user is required to remove the devices to hear conversations, safely cross streets, etc. Further, ear-mounted devices for those who experience hearing loss often fail to accurately reproduce environmental cues, thus making it difficult for wearers to localize reproduced sounds.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In some embodiments, an ear-mounted sound reproduction system is provided. The system comprises a housing, a plurality of microphones, a driver element, and a sound processing device. The housing has an internally directed portion and an externally directed portion. The plurality of microphones are mounted on the externally directed portion of the housing. The housing is shaped to position the plurality of microphones at least partially within a pinna of an ear. The driver element is mounted on the internally directed portion of the housing. The sound processing device includes logic that, in response to execution, causes the ear-mounted sound reproduction system to perform operations including receiving a set of signals, each signal of the set of signals received from a microphone of the plurality of microphones; for each signal of the set of signals, processing the signal using a filter associated with the microphone from which the signal was received to generate a separate filtered signal; combining the separate filtered signals to create a combined signal; and providing the combined signal to the driver element for emission.
In some embodiments, a computer-implemented method of optimizing output of a plurality of ear-mounted microphones is provided. A plurality of microphones of a device inserted into an ear receive input signals from a plurality of sound sources. For each microphone of the plurality of microphones, the input signals received by the microphone are processed using a separate filter to create separate processed signals. The separate processed signals are combined to create combined output signals. The combined output signals are compared to reference signals. The separate filters are adjusted to minimize differences between the combined output signals and the reference signals. The adjusted filters are stored for use by a controller of the device.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
In some embodiments of the present disclosure, an ear-mounted sound reproduction system is provided. The system includes an ear-mountable housing that sits within the pinna of the ear and occludes the ear canal. In some embodiments, the ear-mountable housing includes a plurality of external-facing microphones. Because the external-facing microphones may be situated within the pinna of the ear but outside of the ear canal, the microphones will experience some, but not all, of the three-dimensional acoustic effects of the pinna. What is desired is for sound reproduced by an internal-facing driver element of the housing to preserve three-dimensional localization cues that would be present at the eardrum in the absence of the housing, such that the housing is essentially transparent to the user.
As shown, the ear-mountable housing 304 is inserted such that the plurality of microphones 310 are located at least partially within a pinna 102 of the ear. For example, the externally directed portion of the ear-mountable housing 304 may be positioned outside of the ear canal 103 but inside the concha, behind the tragus/antitragus, or otherwise within a portion of anatomy of the pinna.
This is unlike a set of over-the-ear headphones with an externally mounted microphone array, at least because the loudspeaker for over-the-ear headphones is outside of the pinna (as are the microphones), and so such headphones constitute a closed system for which three-dimensional auditory cues can easily be reproduced without complex processing. In contrast, the microphones 310 receive some, but not all, of the three-dimensional acoustic effects imparted by the pinna 102. Accordingly, in order to cause the driver element 312 to accurately reproduce the three-dimensional acoustic effects that would be received at the eardrum 112 in the absence of the housing 304, filters should be determined such that the signals from the microphones 310 can be combined to accurately reproduce such effects. Once filters are determined that can provide transparency, further functionality, such as beamforming, may be provided as well.
In some embodiments, the ear-mountable housing 304 includes a plurality of microphones 310, a driver element 312, and an optional in-ear microphone 314. The ear-mountable housing 304 includes an internally directed portion and an externally directed portion. The externally directed portion and the internally directed portion together enclose a volume in which other components, including but not limited to at least one of a battery, a communication interface, and a processor, may be provided.
In some embodiments, the internally directed portion is shaped to fit within an ear canal of a user, and may be retained in the ear canal with a friction fit. In some embodiments, the internally directed portion may be custom-formed to the particular shape of the ear canal of a particular user. In some embodiments, the internally directed portion may completely occlude the ear canal. The driver element 312 and optional in-ear microphone 314 may be mounted at a distal end of the internally directed portion.
In some embodiments, the externally directed portion may include a surface on which the microphones 310 are mounted. In some embodiments, the externally directed portion may have a circular shape with the microphones 310 distributed through the circular shape. In some embodiments, the externally directed portion may have a shape that is custom formed to coincide with the anatomy of the pinna of the user. In some embodiments, the externally directed portion may include a planar surface, such that the microphones 310 are disposed in a single plane. In some embodiments, the externally directed portion may include a semi-spherical structure or some other shape upon which the microphones 310 are disposed, such that the microphones 310 are not disposed in a single plane. In some embodiments, when the ear-mountable housing 304 is positioned within the ear, the plane in which the microphones 310 are situated is angled to the front of the head.
In some embodiments, the microphones of the plurality of microphones 310 may be any type of microphone with a suitable form factor, including but not limited to MEMS microphones. In some embodiments, the driver element 312 may be any type of high-definition loudspeaker capable of generating a full range of audible frequencies (e.g., from about 50 Hz to about 20 KHz). In some embodiments, the in-ear microphone 314 may also be any type of microphone with a suitable form factor, including but not limited to MEMS microphones. The in-ear microphone 314 may be optional, because in some embodiments, only a separate microphone may be used to measure the performance of the driver element 312.
As stated above, the sound reproduction system 302 also includes a DSP device 306. In some embodiments, the DSP device 306 is configured to receive analog signals from the microphones 310 and to convert them into digital signals to be processed by the sound processing device 308. In some embodiments, the DSP device 306 may also be configured to receive digital signals from the sound processing device 308, to convert the digital signals into analog signals, and to provide the analog signals to the driver element 312 for reproduction. One non-limiting example of a device suitable for use as a DSP device 306 is an ADAU1467Z SigmaDSP® processor provided by Analog Devices, Inc.
As shown, the sound processing device 308 includes a signal recording engine 316, a filter determination engine 318, a signal reproduction engine 320, a recording data store 322, and a filter data store 324. In some embodiments, the signal recording engine 316 is configured to receive digital signals from the DSP device 306 and to store the received signals in the recording data store 322. The signal recording engine 316 may also store indications of a particular microphone 310 and/or sound source associated with a received signal. In some embodiments, the filter determination engine 318 is configured to determine filters that can be applied to signals received from the microphones 310 such that the processed signals may be combined to generate a combined signal that is as close as possible to matching a signal that would be received at the eardrum in the absence of the ear-mountable housing 304. The filter determination engine 318 may be configured to store the determined filters in the filter data store 324. In some embodiments, the signal reproduction engine 320 is configured to apply the filters to signals received from the DSP device 306, and to provide a combined processed signal to the DSP device 306 to be reproduced by the driver element 312.
In general, the term “engine” as used herein refers to logic embodied in hardware or software instructions, which can be written in a programming language, such as C, C++, COBOL, JAVA™, PHP, Perl, HTML, CSS, JavaScript, VBScript, ASPX, Microsoft .NET™ languages such as C #, application-specific languages such as Matlab, and/or the like. An engine may be compiled into executable programs or written in interpreted programming languages. Engines may be callable from other engines or from themselves. Generally, the engines described herein refer to logical modules that can be merged with other engines or applications, or can be divided into sub-engines. The engines can be stored in any type of computer readable medium or computer storage device and be stored on and executed by one or more general purpose computers, thus creating a special purpose computer configured to provide the engine. Accordingly, the devices and systems illustrated herein include one or more computing devices configured to provide the illustrated engines.
In general, a “data store” as described herein may be provided by any suitable device configured to store data for access by a computing device. One example of a data store is a highly reliable, high-speed relational database management system (RDBMS) executing on one or more computing devices and accessible locally or over a high-speed network. However, any other suitable storage technique and/or device capable of quickly and reliably providing the stored data in response to queries may be used, such as a key-value store, an object database, and/or the like. The computing device providing the data store may be accessible locally instead of over a network, or may be provided as a cloud-based service. A data store may also include data stored in an organized manner on a computer-readable storage medium, as described further below. Another example of a data store is a file system or database management system that stores data in files (or records) on a computer readable medium such as flash memory, random access memory (RAM), hard disk drives, and/or the like. Separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure.
As illustrated, the sound reproduction system 302 includes separate devices for the ear-mountable housing 304, the DSP device 306, and the sound processing device 308. In some embodiments, the functionality described as being provided by the sound processing device 308 may be provided by one or more application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or any other type of hardware with circuitry for implementing logic. In some embodiments, the functionality described as being provided by the sound processing device 308 may be embodied by instructions stored within a computer-readable medium, and may cause the sound reproduction system 302 to perform the functionality in response to executing the instructions. In some embodiments, the functionality of the sound processing device 308 may be provided by a MOTU soundcard and a computing device such as a laptop computing device, desktop computing device, server computing device, or cloud computing device running digital audio workstation (DAW) software such as Pro Tools, Studio One, Cubase, or MOTU Digital Performer. The DAW software may be enhanced with a virtual studio technology (VST) plugin to provide the engine functionality. Further numerical analysis conducted by the engines may be performed in mathematical analysis software such as matlab. In some embodiments, the functionality of the DSP device 306 may also be provided by software executed by the sound processing device 308, such as MAX msp provided by Cycling '74, or Pure Data (PD).
In some embodiments, functionality of the DSP device 306 may be incorporated into the ear-mountable housing 304 or the sound processing device 308. In some embodiments, all of the functionality may be located within the ear-mountable housing 304. In some embodiments, some of the functionality described as being provided by the sound processing device 308 may be provided instead within the ear-mountable housing 304. For example, a separate sound processing device 308 may provide the signal recording engine 316, filter determination engine 318, and recording data store 312 in order to determine the filters to be used, while the functionality of the filter data store 324 and signal reproduction engine 320 may be provided by the ear-mountable housing 304.
In some embodiments, a goal of the method 400 is to be able to combine the signals from the M microphones of the plurality of microphones 310 such that the frequency response of the combined signals matches a given target signal as closely as possible. The expression AV, k, m) represents the complex-valued frequency response at a microphone m=1, 2, . . . , M for a sound source at position k=1, 2, . . . , K, at frequency f and the expression TV, k) represents a target frequency response for sound source k. The combination comprises filtering the microphone signals and adding together the filter outputs. The frequency response Y(f, k) of the overall output of the filtering and combination process can be written as follows:
where W(f, m) is the frequency response of the mth filter being designed, Ak is an M-element column vector with mth element A(f, k, m), T means matrix transpose, and W is an M-element column vector with mth element W(f, m). The design methods disclosed herein search for filters W(f, m) such that Y(f, k) matches T(f, k) given some matching criterion. The filtering and combination process can either be done in the frequency domain or by converting the W(f, m) filters to a set of M time-domain filters, or using similar design techniques in the time domain. By minimizing the error in the combined signal for a plurality of sound sources, filters can be determined that provide maximum performance for the device 304 regardless of the direction of the incoming sound. As discussed further below, similar techniques that use other optimizations (such as beamforming or otherwise prioritizing some directions over others) may also be used.
At block 402 (
Though
Returning to
At block 412, a signal recording engine 316 of the sound processing device 308 stores the received signal in a recording data store 322 as a target signal for the sound source. If further sound sources remain to be processed, then the method 400 proceeds from the for-loop end block 414 to the for-loop start block 406 to process the next sound source. Otherwise, if all of the sound sources have been processed, then the method 400 proceeds from the for-loop end block 414 to a continuation terminal (“terminal A”). In some embodiments, each sound source of the plurality of sound sources is processed separately so that the readings obtained from each sound source do not interfere with each other.
At block 416 (
Returning to
From the for-loop start block 420, the method 400 proceeds to block 422, where the sound source generates a test signal. The test signal is the same as the test signal generated at block 408. At block 424, the microphone 310 receives the test signal as affected by at least a portion of the ear simulator 503 and transmits the received signal to the sound processing device 308. In some embodiments, transmitting the received signal to the sound processing device 308 includes transmitting an analog signal from the microphone 310 to the DSP device 306, converting the analog signal to a digital signal, and transmitting the digital signal from the DSP device 306 to the sound processing device 308. At block 426, the signal recording engine 316 stores the received signal for the microphone 310 and the sound source in the recording data store 322.
If further microphones 310 remain to be processed for the sound source, then the method 400 proceeds from the for-loop end block 428 to the for-loop start block 420 to process the next microphone 310. Otherwise, if all of the microphones 310 have been processed, then the method 400 proceeds to the for-loop end block 430. If further sound sources remain to be processed, then the method 400 proceeds from the for-loop end block 430 to the for-loop start block 418 to process the next sound source. Otherwise, if all of the sound sources have been processed, then the method 400 proceeds to a continuation terminal (“terminal B”).
In
From for-loop start block 434, the method 400 proceeds to block 436, where a signal reproduction engine 320 of the sound processing device 308 processes the stored received signal using a separate filter for the microphone 310 to create a separate processed signal. In some embodiments, the separate filter is the filter to be applied to signals from a particular microphone 310 of the plurality of microphones. In some embodiments, the separate filter used for the first pass through block 436 for a particular microphone 310 may be a default filter which is adjusted later as discussed below.
If further microphones 310 remain to be processed, then the method 400 proceeds from the for-loop end block 438 to the for-loop start block 434 to process the stored received signal for the next microphone 310. Otherwise, if the stored received signals for all of the microphones 310 have been processed, then the method 400 proceeds from the for-loop end block 438 to block 440. At block 440, the signal reproduction engine 320 combines the separate processed signals to create a combined output signal for the sound source. At block 442, the signal reproduction engine 320 stores the combined output signal for the sound source in the recording data store 322.
The method 400 then proceeds to the for-loop end block 444. If further sound sources remain to be processed, then the method 400 proceeds from the for-loop end block 444 to the for-loop start block 432 to process the next sound source. Otherwise, if all of the sound sources have been processed, then the method 400 proceeds from the for-loop end block 444 to a continuation terminal (“terminal C”).
At block 446 (
This can also be expressed using vector notation as:
C=(T′−W′·A′)·(T−A·W)
where T is a K-element column vector with kth element TV, k), and A is an M×K matrix with rows AkT, and A′ is its complex-conjugate transpose.
At decision block 448, a determination is made regarding whether the performance of the existing filters is adequate. If it is determined that the performance of the existing filters is not adequate, then the result of decision block 448 is NO. At block 450, the filter determination engine 318 adjusts the separate filters to minimize differences between the combined output signals and the target signals, and then returns to terminal B to process the stored received signals using the newly adjusted filters.
The illustrated iterative method may include various optimization techniques for minimizing the combined errors. In some embodiments, the method may be able to compute ideal filters directly without looping back to re-test the filters. In some embodiments, to find the W that minimizes the squared difference error criterion described above, the gradient may be taken with respect to W* and set equal to zero, which yields:
∇Cw·=0=−A′·T+A′·A·W
And, finally,
W=R−1·p
where R=A′·A, and p=A′·T.
In some embodiments, variations on the squared error described above may be used. For example, in some embodiments, a K×K diagonal matrix Q may be used to give more importance to some source positions than others, in order to ensure that signals from those source positions are the most accurately reproduced in the combination of processed signals. With scalar value qkk on the kth element of the diagonal, the resulting filter W will be more sensitive to positions k with larger values qkk than others with smaller values. For such embodiments, the criterion becomes:
C=(T′−W′·A′)·Q·(T−A·W)
yielding:
W=RQ−1·pQ
with RQ=A′·Q·A, and pQ=A′·Q·T.
In some embodiments, the criterion may use the squared difference, as discussed above, subject to constraining the filter to take on certain values for certain sound source positions. Let P be an M×N matrix whose N columns are the Ak vectors corresponding to the constrained positions. Let G be an N-element column vector with the values to take on. Then, these additional constraints can be written P′ ·W=G. Using the method of Lagrange multipliers, the resulting W vector will be:
W=R−1·A′·T−R−1·P·(P′·R−1·P)−1·(G−P′·R−1·A′·T)
Other criteria can be met using the theory of convex optimization. For example, in some embodiments, convex optimization may be used to find the filters that minimize the squared difference as above whilst limiting the maximum squared difference to be less than or equal to some predetermined threshold value.
Returning to decision block 448, if it is determined that the performance of the existing filters is adequate, then the result of decision block 448 is YES. At block 452, the filter determination engine 318 stores the adjusted separate filters in a filter data store 324 of the sound processing device 308.
In some embodiments, the adjusted separate filters may then be used by the signal reproduction engine 320 to generate signals to be reproduced by the driver element 312. For example, a live signal may be received from a sound source by the microphones 310. Each of the microphones 310 provides its received version of the live signal to the signal reproduction engine 320 (via the DSP device 306). The signal reproduction engine 320 processes the received live signals with the adjusted separate filters for the microphones 310, combines the processed live signals, and provides the combined processed live signal to the driver element 312 (via the DSP device 306) for reproduction.
The criteria described above are based on the frequency response as measured at a single device 304. In some embodiments, two devices (e.g., one in each ear of a listener) may be used. In such embodiments, another useful criterion would be related to preserving the ratio of the target responses at the two ears. With a left device and a right device, and the same set of filters applied separately to each array output, the ratio-based criterion at a given position k would be:
where subscript L and R mean left and right, respectively, and TkL and TkR are the target responses for source position k. This can be rearranged to yield:
(AkLT,TkR−AkRT·TkL)·W=0
The trivial solution W=0 should be avoided. One technique for avoiding the trivial solution is to constrain the filters such that they yield a certain result for a given position. Without loss of generality, one can specify that the previous equation be met exactly when k=0. To minimize the sum of squares of the above equation's left-hand side over all positions k subject to exactly satisfying it at k=0, the sum of squares can be written as:
and simplified to:
Stated succinctly, we wish to minimize:
W′·RZ·W
subject to:
A0LT·W=T0L
and:
A0RT·W=T0R
This formulation is the same as that of the linearly constrained, minimum variance beamformer, with solution:
W=RZ−1·A0·(A′0·RZ−1·A0)−1·T0
where:
A0=[A0LA0R] and T0=[T0LT0R]T
Further, target responses can be the raw responses as measured with the method of
In some embodiments, multiple sets of filters may be determined, and a “best” filter may be chosen for a given condition at runtime. For example, in some embodiments, a first filter may be determined for optimal performance in reproducing speech, a second filter may be determined for optimal performance in reproducing music, a third filter may be determined for optimal performance in noisy environments, and a fourth filter may be determined for optimal performance in a predetermined direction. At runtime, a filter may be chosen by the user, or may be performed automatically based on a detected environmental condition. In some embodiments, the switch between filters at runtime may be performed smoothly, by morphing coefficients over time, or by mixing audio generated using a first filter to audio generated using a second filter smoothly over time.
While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.
This application is a continuation of U.S. application Ser. No. 16/522,394, filed Jul. 25, 2019, the entire disclosure of which is hereby incorporated by reference herein for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
20040218771 | Chalupper et al. | Nov 2004 | A1 |
20070014423 | Darbut | Jan 2007 | A1 |
20100092016 | Iwano et al. | Apr 2010 | A1 |
20120029472 | Podack et al. | Feb 2012 | A1 |
20190394576 | Petersen et al. | Dec 2019 | A1 |
Number | Date | Country |
---|---|---|
2611218 | Jul 2013 | EP |
Entry |
---|
Phonak, “Real Ear Sound: A Simulation of the Pinna Effect Optimizes Sound Localization Also With Open Fittings,” microSavia: Field Study News, 2005, 2 pages. |
Salvador, C.D., et al., “Design Theory for Binaural Synthesis: Combining Microphone Array Recordings and Head-Related Transfer Function Datasets,” Acoustical Science and Technology 38(2):51-62, 2017. |
International Search Report and Written Opinion dated Oct. 19, 2020 in International Patent Application No. PCT/US2020/040674, 12 pages. |
International Preliminary Report on Patentability, PCT App. No. PCT/US20/40674, dated Feb. 3, 2022, 8 pages. |
Non-Final Office Action, U.S. Appl. No. 16/522,394, dated Jul. 2, 2020, 8 pages. |
Notice of Allowance, U.S. Appl. No. 16/522,394, dated Nov. 18, 2020, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20210211810 A1 | Jul 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16522394 | Jul 2019 | US |
Child | 17203589 | US |