This disclosure relates generally to audio devices.
Headphones are a pair of loudspeakers worn on or around a user's ears. Circumaural headphones use a band on the top of the user's head to hold the speakers in place over or in the user's ears. Another type of headphones are known as earbuds or earpieces and include individual monolithic units that plug into the user's ear canal.
Both headphones and ear buds are becoming more common with increased use of personal electronic devices. For example, people use head phones to connect to their phones to play music, listen to podcasts, etc. However, headphone devices are currently not designed for all-day wear since their presence blocks outside noise from entering the ear. Thus, the user is required to remove the devices to hear conversations, safely cross streets, etc.
Non-limiting and non-exhaustive embodiments of the invention are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified. Not all instances of an element are necessarily labeled so as not to clutter the drawings where appropriate. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles being described.
Embodiments of a system, apparatus, and method for a transparent sound device are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the techniques described herein can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring certain aspects.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Generally, ear-worn monitors are useful for displaying sounds to the human ear while on the go. Music, directions, digital assistants, and ambient sound modification are all things people want. Accordingly, it is desirable to be able to wear headphones all day in order to achieve a continuous enhanced audio experience. However, noise canceling and ear occluding devices need to be removed to accurately hear the surrounding world. Put another way, these devices do not allow for sound transparency, thus requiring individuals to constantly move their ear phones on and off of their ears. Taking earphones on and off is inconvenient and frequently results in the user losing/misplacing the devices. Accordingly, active sound modification to achieve “sound transparency” is beneficial so the user does not need to remove the device from their ears.
However, one reason why active sound augmentation is difficult to achieve is that the head-related transfer function (HRTF)—a function that characterizes how an individual's ear receives a sound, which takes into consideration many variables such as the size and shape of the head, ears, ear canal, density of the head, size and shape of nasal and oral cavities—is difficult to measure, and is different for each person. Accordingly, a one-size-fits-all approach to active sound modification devices may not work well.
Here we present an apparatus, system, and methods for devices to perform highly accurate sound augmentation. Devices described in examples in accordance with the teaching of the present disclosure, may include N microphones to receive external sounds (including both sounds received from the user—chewing, sneezing, breathing, etc.—and sounds from outside the user—car horns, engine noise, etc.). The device may also have an application specific integrated circuit (ASIC) with a low-latency (e.g, analog) audio processing path and a digital control path to adjust how low-latency signals are processed (e.g., digitally changing filter parameters, but filters are applied to analog signals). Then, the processed audio is output from a speaker in or near the user's ear. It is appreciated that in some embodiments, by keeping the audio signals in the analog domain, processing time is kept to a minimum—an important metric in real-time audio processing. In order to account for each individual's unique HRTF, the digital control parameters are created and personalized using an algorithm (e.g., a machine learning algorithm like a neural network) that uses ground-truth information collected from many users to output the digital control parameters.
The following disclosure will describe the embodiments discussed above, and other embodiments, as they relate to the figures.
As shown, the housing shaped to hold in-ear device 200A in an ear of a user (e.g., by friction fitting into portions of the concha) and at least partially occludes the canal. An audio package (see infra
As shown, in-ear device 200A may be designed for extended wear (due to the soft polymer molding 201 that is custom made for each individual user). As stated, the housing may at least partially occlude the canal of the ear when it is positioned in the ear. This may cause the user to experience sounds in a manner similar to wearing ear plugs. Accordingly, it is desirable for the device to provide at least partial “sound transparency” to the user. Put another way, the device may receive sounds with the microphones (e.g., microphones 211) and re-emit the sounds to the user—after the sound augmentation process, described above, occurs—so that the user hears the sounds as if there was no device occluding his/her ear canal.
In addition to providing sound transparency, it is appreciated that the device herein may cancel sound, amplify select sounds, translate language, play music/audio, provide virtual assistant services (e.g., the headphones record a question, send the natural language data to the cloud for processing, and receive a natural language answer to the question), or the like. These other processes, where processing time matters less than real-time sound augmentation, may be performed with a general-purpose processor in the controller, or other ASICs in the controller, or sent to the cloud for remote processing. As stated, second set one or more of microphones 211 may be canal microphones (e.g., facing into the ear canal to receive external sound in the ear canal such as speech or other sounds generated by the user). The canal microphones may be used to receive the user's speech (e.g., when in-ear device 200D is used to make a phone call) and transmit the recorded sound data to an external device (e.g., smartphone). Canal microphones may also be used for noise cancellation and sound transparency functionality to detect noises made by the user (e.g., chewing, breathing, or the like) and cancel these noises in the occluded (e.g., by in-ear device 200) ear canal. It is appreciated that user generated noises can seem especially loud in an occluded canal, and accordingly, it may be desirable to use noise cancellation technologies described herein to cancel these sounds.
The, device 200B depicted may perform all the same functionality as described in connection with device 200A in
In the depicted embodiment, the digital control parameters (which may be in a control file) are stored in a memory 259 in the controller 247. As will be described in connection with
In the depicted embodiment, the low-latency audio processing path includes mapping a plurality of microphone inputs (e.g., from microphones 211 and 215) to one or more audio outputs (e.g., speakers 213 in audio package 217), and there are more microphone inputs than audio outputs. Accordingly, accurate mapping may be achieved by playing point sounds to individual users and recording the sound that reaches their ear drum. A machine learning algorithm may be used to map the microphone inputs to the speaker outputs to achieve a sound wave that interacts with the ear drum in the same way that the natural sound did. Thus providing mapping that is capable of achieving sound transparency.
As shown, communication circuitry 257 may communicate with a smart phone 277 or other portable electronic device, and/or one or more servers 271 and storage 275 which are part of the “cloud” 273. Data may be transmitted to the external devices from in-ear device 200, for example recordings from microphones 229/231 may be sent to smart phone 277 and uploaded to the cloud. Conversely, data may be downloaded from one or more external devices; for example, music may be retrieved from smart phone 277 or directly from a WIFI network (e.g., in the user's house). The smart phone 277 or other remote devices may be used to interact with, and control, in-ear device 200D manually (e.g., through a user interface like an app) or automatically (e.g., automatic data synch). In some embodiments, the one or more external devices depicted may be used to perform calculations that are processor intensive, and send the results back to the in-ear device 200C.
In the depicted embodiment, communications circuitry 275 (e.g., a wireless or wired transceiver), may also communicate with external device(s) (e.g., personal electronic device 277, or directly to a router to connect to servers 271 or the like) to receive an updated control file including second digital control parameters that are different than the digital control parameters. Second digital control parameters may include new or updated control parameters that may better serve the user (e.g., parameters that allow the user to hear better than the original parameters or parameters generated after a software update). Put another way, the user may update control parameters iteratively, or switch control parameters for different users (since each user has a unique HRTF). Updates to the control file may be automatic or the user may tweak their own control file using an app or the like. This may include the user capturing updated pictures of themselves (see e.g.,
Image 301 shows the user taking an image of their head area. In the depicted embodiment this includes the user taking a panoramic-type photo (e.g., swiping the camera to the left as it captures many images of the user) with their personal electronic device (e.g., a smartphone, tablet, or the like). In some embodiments, this photo may include only 2D image data; however, in other embodiments the camera in the personal electronic device may be able to capture 3D image data (e.g., 2D image data plus depth data). In some embodiments, more complex methods of capturing an image of the user may be used (e.g., 2D imaging in conjunction with LIDAR or the like). The user may then be able to upload this image to the cloud with, for example, a “Custom Headphones” application or the like running on their phone.
Image 303 shows the cloud (e.g., one or more remote servers or processing apparatuses) receiving image data—which includes data describing at least part of a user's head (e.g., head size, head shape, ear shape, or ear location)—from the personal electronic device via a network (e.g., the internet or local area network). The image is then converted into a model of at least part of the user's head. Here, the model is a 3D point cloud, which may be derived from a 2D image (e.g., using triangulation, artificial intelligence techniques, or the like). 3D image data (e.g., from 3D cameras) may also be used to create a model with less processing. One of skill in the art will appreciate that the model described here can be any data derived from the image data.
Image 305 shows generating, using a processing apparatus, a control file corresponding to the model, where the digital control parameters in the file are derived from the model of the user's anatomy. The control file includes digital control parameters with weights to bias low-latency circuits in the low-latency audio processing path, and the low-latency audio processing path is included in a controller of the audio device. In the depicted embodiment generating the control file includes using a deep neural network machine learning algorithm to generate the digital control parameters, and the model is included in the inputs to the algorithm and the digital control parameters in are included in the outputs of the algorithm. Thus, the machine learning algorithm receives the model of the user's anatomy, and outputs the digital control parameters for the control file.
In some embodiments, the machine learning algorithm that outputs the digital control parameters may be trained using a plurality of head models (e.g., 3D point cloud data of anonymized heads) and ground-truth digital control parameters (e.g., the control parameters for the 3D point cloud head model data that produced the best sound). This training data may be created both by measuring actual people and inputting their metrics into a database (all actions performed with informed consent only), and by generating simulated data (e.g., using several measurements of head data and interpolating or extrapolating other head data metrics). For example, a person with a very large head could be measured, and a person with a very small head could be measured. This information may be used to interpolate ground-truth data for someone with a medium-sized head. It is appreciated that the plurality of head models and ground-truth digital control parameters may be located in a database coupled to communicate with the processing apparatus (e.g., one or more servers, a general purpose processor, graphics cards running the machine learning algorithms, or the like) to train the machine learning algorithm. In some embodiments, as more head scans are uploaded, the machine learning algorithm may further improve its accuracy to output digital control parameters that correspond to individual users.
Image 307 shows sending a control file, including the digital control parameters generated by the machine learning algorithm, to an in ear device. It is appreciated that the file may pass through other intermediate devices before reaching the controller in the in-ear device.
Blocks 401-407 illustrate programming the audio device. Block 401 shows receiving image data including data describing at least part of a user's head. As described above, image data may be received from a camera disposed in a personal electronic device via a network, or from other devices.
Block 403 depicts converting the image data into a model of at least part of the user's head. In one embodiment, converting the image data into a model includes converting the image data into a three-dimensional point cloud.
Block 405 illustrates generating, using a processing apparatus, a control file corresponding to the model. As stated, the control file includes digital control parameters that bias low-latency circuits (e.g., by increasing or decreasing gain, etc.), in a low-latency audio processing path in a controller of the audio device. In some embodiments, generating the control file includes using an algorithm to generate the digital control parameters, and the model of the user's head is included in the inputs to the algorithm and the digital control parameters are included in the outputs of the algorithm. In one embodiment, the algorithm includes a deep neural network machine learning algorithm. However, in other embodiments the algorithm finds (e.g., using a root-mean squared similarity method of the like) a head model in a database similar to the model of the user, and outputs the corresponding digital control parameters.
Block 407 shows sending the control file to the audio device via a network. This may include sending the control file to a smartphone over a wireless network and through a headphone cable to the in-ear devices. Alternatively, the in-ear devices may revive the control file directly over the internet through a wireless connection or the like.
Blocks 409-413 illustrate operating the device after the control file has been received. Block 409 depicts receiving external sound with a first set of one or more microphones to generate a low-latency sound signal, where the one or more microphones are coupled to the controller. This may occur after an initial install of the control file, or after an updated control file has been received.
Block 411 illustrates augmenting the low-latency sound signal by passing the low-latency sound signal through the low-latency audio processing path in the controller to produce an augmented sound signal. Digital control parameters include weights to bias the low-latency circuits in the low-latency audio processing path (e.g., by adjusting resistances in filters, controlling the gain in an amplifier, or the like) thereby augmenting the low-latency sound signal as it is passed through the low-latency audio processing path in a manner personalized or customized for the individual user.
Block 413 shows outputting, with an audio package, augmented sound based on the augmented sound signal. Using the techniques presented herein, in some embodiments, the augmented sound may provide at least partial sound transparency to the user. Other embodiments may provide for noise cancellation or reduction of the augmented sound signal.
The processes explained above are described in terms of computer software and hardware. The techniques described may constitute machine-executable instructions embodied within a tangible or non-transitory machine (e.g., computer) readable storage medium, that when executed by a machine will cause the machine to perform the operations described. Additionally, the processes may be embodied within hardware, such as an application specific integrated circuit (“ASIC”) or otherwise.
A tangible machine-readable storage medium includes any mechanism that provides (i.e., stores) information in a non-transitory form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.). For example, a machine-readable storage medium includes recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.).
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.