The present application relates generally to audio processing mechanisms with personalized frequency response filters and personalized head-related transfer function (HRTF) filters.
Equalizers are used for modifying the amplitude of sound frequencies based on a listener's preferences but are not capable of being manipulated to discriminate individual sounds. As understood herein, combining frequency amplitude adjustment (to compensate for hearing impairment) with customized HRTF can be used to deliver superior sound clarity and localization for individuals who seek to experience great sound immersion, and also can be used for individuals who have hearing aids or minor hearing impairments to distinguish sounds better without modifying the spectrum of audible frequencies.
Manufacturers of hearing aids traditionally have simply modified the amplitude of all the waveforms in a collection or spectrum of sounds. This is acceptable for generalized hearing loss, but it introduces noise as well, which can make it hard to distinguish sounds as the waveforms are amplified. A much larger group of people can hear well in a quiet environment, but have trouble listening to and understanding conversations in noisy environments. These people typically suffer from partial hearing loss or find it hard to distinguish foreground sounds from background sounds.
Present principles are directed to providing an audio processing mechanism including a unique Head Related Transfer Function (HRTF) filter and a personalized frequency response filter for a set of headphones for a specific listener by employing object-based audio technology to adjust sound characteristics that conform to the listener's hearing capability. The sound compensation algorithm may be derived by using a Discrete Fourier Transform (DFT) or a Fast Fourier Transform (FFT) method for binning sound frequencies (spectrum of waveforms) that are then reduced to sound field objects. Each object created is based on a set of frequencies often referred to as a pitch range or unique acoustical sensation to the listener's ear. Each DFT or FFT bin reflects or captures an audio frequency range with related characteristics, such as the timbre of a person's voice or the location of an instrument being played or directionality of a background sound such as a plane flying overhead. The sound object can then be adjusted in its base sound elements to allow for a headphone listener to more accurately hear or discern sounds with more clarity and less distortion. The personalized sound experience is thus customized to the way each listener hears sound. Each object is adjusted for a personal listener based on that user's ear structure, hearing capabilities and impairments. The total soundstage of objects for a song or movie or TV show can be acoustically blended together and directionally positioned based on a personal audio processing mechanism. The end result is a personalized sound experience by a set of headphones that takes into account the actual listeners hearing modalities.
The use of FFT and DFTs to isolate bins of sound that represent discrete sound objects in nature is one preferred method for object-based audio. These bins can be analyzed and characterized based on their sound elements to identify their location, directionality, sinusoidal waveform and distribution. Vocals can be distinguished from instruments as are static objects distinguished from moving objects. This characterization of sound can be determined based on analyzing the DFT or FFT bins that are digital slices of frequencies or waveforms in a time-based sampling. Also employing customized HRTFs that then utilize frequency filters that are based upon the listener who identifies direct impairments using a calibration model to derive the listener's sound profile.
Accordingly, in an aspect, a system includes at least a left audio channel input and a left channel impulse response (IR) filter configured for receiving audio data from the left channel audio input. The left channel IR filter has taps established at least in part by a frequency response profile for a left ear of a listener. A left channel head related transfer function (HRTF) filter includes taps established at least in part by a physical characteristic of the listener. The left channel HRTF is for receiving audio data from the left channel audio input and is in series with the left channel IR filter to send signals to or receive signals from the left channel IR filter. A left channel speaker is configured for receiving signals that have passed through the left channel IR filter and left channel HRTF filter for transducing the signals into sound.
At least a right audio channel input is also provided, and a right channel IR filter is configured for receiving audio data from the right channel audio input. The right channel IR filter has taps established at least in part by a frequency response profile for a right ear of the listener. A right channel HRTF filter includes taps established at least in part by a physical characteristic of the listener and is configured for receiving audio data from the right channel audio input. The right channel HRTF filter is in series with right left channel IR filter to send signals to or receive signals from the right channel IR filter. A right channel speaker is configured for receiving signals that have passed through the right channel IR filter and right channel HRTF filter for transducing the signals into sound.
The left channel IR filter may be a finite IR filter. The left channel IR filter can be an infinite IR filter. Likewise, the right channel IR filter may be a FIR or an IIR filter.
In some examples, the left channel HRTF filter sends signals to the left channel IR filter (i.e., is upstream of the left channel IR filter). In other examples, the left channel HRTF filter receives signals from the left channel IR filter (i.e., is downstream from the left channel IR filter). The right channel may be similarly configured.
In example embodiments, at least a left microphone and at least a right microphone are provided, and the left channel IR is configured for receiving signals from the left microphone while the right channel IR is configured for receiving signals from the right microphone. In this way, noise cancellation or noise augmentation may be effected.
In example embodiments, at least one computer medium that is not a transitory signal includes instructions executable by at least one processor to access at least a first set of HRTFs tailored to the listener, with each HRTF being associated with an orientation of the listener's head. The instructions are executable to identify an orientation of the listener's head, and to identify a first one of the first set of HRTF based at least in part on the identification of the orientation of the listener's head. The instructions are further executable to convolute an audio stream using the first one of the first set of HRTF to render an adjusted stream and play the adjusted stream on at least the left speaker. If desired, the instructions may be executable to concatenate the first one of the first set of HRTF with a HRTF associated with a space to render a concatenated HRTF. The instructions can be executable to convolute the audio stream using the concatenated HRTF to render the adjusted stream, and to play the adjusted stream on at least the left speaker.
In another aspect, a method includes generating a left ear frequency response profile for a specific listener and generating a right ear frequency response profile for the specific listener. The method includes grouping frequencies in objects in each profile such that each object represents a group of respective frequencies. Further, the method includes establishing the taps of respective left and right impulse response (IR) filters using the objects, passing signals through the IR filters to render filtered signals, and playing the filtered signals on respective left and right speakers.
In another aspect, an assembly includes at least a left audio channel input and a left channel impulse response (IR) filter configured for receiving audio data from the left channel audio input. The left channel IR filter has taps established at least in part by an audio object-based frequency response profile for a left ear of a listener. A left channel speaker is configured for receiving signals that have passed through the left channel IR filter for transducing the signals into sound. At least a right audio channel input is also provided, and a right channel IR filter is configured for receiving audio data from the right channel audio input. The right channel IR filter has taps established at least in part by an audio object-based frequency response profile for a right ear of the listener. A right channel speaker is configured for receiving signals that have passed through the right channel IR filter for transducing the signals into sound.
The details of the present application, both as to its structure and operation, can be best understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:
Both a personalized frequency response filter in combination with a personalized HRTF are provided herein.
In overview, HRTF calibration is rendered relatively main stream by, in one embodiment, creating a HRTF calibration file using a pair of headphones that have special-purpose built-in microphones. The calibration file stores the FIR coefficients. Some of the microphones can be located inside the headphones, some inside the ears, and some outside the headphones. The headphones are connected to a sound source via the microphones. The sound source then generates key calibration sounds that are recorded by the microphones and stored digitally on a personal computer or other smart device. In some implementations the sound source material is generated by a particular sound system (2-channel or multi-channel) that exists outside the headphones. Internal (relative to the headphones) calibration signals may be used to aid the process as well.
Several different calibration files may be created. For example, a calibration file can be created for two-channel sound, another for more than two-channel sound (“multichannel sound”), and another to aid in up-rendering two-channel sound to multi-channel sound. With these different types of portable calibration files, an end user can implement his personalized HRTF on any audio processing to generate a particular three-dimensional (3D) sound experience that produces the sense on the part of the user that the sound is not emanating from, e.g., headphone speakers by the ears, but rather from sources such as speakers or an orchestra outside the headphones. This creates a 3-D sound experience and may include height and head tracking such that perceived sound sources remain in their pre-determined locations even when the head is moving around.
As mentioned above, the calibration file can include an FIR filter or filters that can be implemented on a digital signal processor (DSP) or general-purpose processor such as and ARM core. The complexity or number of taps needed to accurately model the user's HRTF may be determined by the application using the calibration files to filter sound on the user's playback device. The user may also be given the opportunity to select the number of taps, within a given range.
With these principles, an end user consumer can own his own pair of special headphones and applications and create the calibration files. The calibration files may be created on a system at a local retail outlet for a fee if desired or complimentary with a purchase, and then consumer takes the file home.
Present principles may be extended to equipment, such as stereo playback on speakers, multi-channel playback, multi-channel playback created from stereo, or future equipment and setups.
This disclosure accordingly relates generally to computer ecosystems including aspects of multiple audio speaker ecosystems. A system herein may include server and client components, connected over a network such that data may be exchanged between the client and server components. The client components may include one or more computing devices that have audio speakers including audio speaker assemblies per se but also including speaker-bearing devices such as portable televisions (e.g. smart TVs, Internet-enabled TVs), portable computers such as laptops and tablet computers, and other mobile devices including smart phones and additional examples discussed below. These client devices may operate with a variety of operating environments. For example, some of the client computers may employ, as examples, operating systems from Microsoft, or a Unix operating system, or operating systems produced by Apple Computer or Google. These operating environments may be used to execute one or more browsing programs, such as a browser made by Microsoft or Google or Mozilla or other browser program that can access web applications hosted by the Internet servers discussed below.
Servers may include one or more processors executing instructions that configure the servers to receive and transmit data over a network such as the Internet. Or, a client and server can be connected over a local intranet or a virtual private network.
Information may be exchanged over a network between the clients and servers. To this end and for security, servers and/or clients can include firewalls, load balancers, temporary storages, and proxies, and other network infrastructure for reliability and security. One or more servers may form an apparatus that implement methods of providing a secure community such as an online social website to network members.
As used herein, instructions refer to computer-implemented steps for processing information in the system. Instructions can be implemented in software, firmware or hardware and include any type of programmed step undertaken by components of the system.
A processor may be any conventional general-purpose single- or multi-chip processor that can execute logic by means of various lines such as address lines, data lines, and control lines and registers and shift registers. A processor may be implemented by a digital signal processor (DSP), for example.
Software modules described by way of the flow charts and user interfaces herein can include various sub-routines, procedures, etc. Without limiting the disclosure, logic stated to be executed by a particular module can be redistributed to other software modules and/or combined together in a single module and/or made available in a shareable library.
Present principles described herein can be implemented as hardware, software, firmware, or combinations thereof; hence, illustrative components, blocks, modules, circuits, and steps are set forth in terms of their functionality.
Further to what has been alluded to above, logical blocks, modules, and circuits described below can be implemented or performed with a general-purpose processor, a digital signal processor (DSP), a field programmable gate array (FPGA) or other programmable logic device such as an application specific integrated circuit (ASIC), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor can be implemented by a controller or state machine or a combination of computing devices.
The functions and methods described below, when implemented in software, can be written in an appropriate language such as but not limited to C# or C++, and can be stored on or transmitted through a computer-readable storage medium such as a random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), compact disk read-only memory (CD-ROM) or other optical disk storage such as digital versatile disc (DVD), magnetic disk storage or other magnetic storage devices including removable thumb drives, etc. A connection may establish a computer-readable medium. Such connections can include, as examples, hard-wired cables including fiber optic and coaxial wires and digital subscriber line (DSL) and twisted pair wires.
Components included in one embodiment can be used in other embodiments in any appropriate combination. For example, any of the various components described herein and/or depicted in the Figures may be combined, interchanged or excluded from other embodiments.
“A system having at least one of A, B, and C” (likewise “a system having at least one of A, B, or C” and “a system having at least one of A, B, C”) includes systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.
Now specifically referring to
Accordingly, to undertake such principles the CE device 12 can be established by some or all of the components shown in
In addition to the foregoing, the CE device 12 may also include one or more input ports 22 such as, e.g., a USB port to physically connect (e.g. using a wired connection) to another CE device and/or a headphone 24 that can be worn by a person 26. The CE device 12 may further include one or more computer memories 28 such as disk-based or solid-state storage that are not transitory signals on which is stored files such as the below-described HRTF calibration files. The CE device 12 may receive, via the ports 22 or wireless links via the interface 18 signals from first microphones 30 in the earpiece of the headphones 24, second microphones 32 in the ears of the person 26, and third microphones 34 external to the headphones and person, although only the headphone microphones may be provided in some embodiments. The signals from the microphones 30, 32, 34 may be digitized by one or more analog to digital converters (ADC) 36, which may be implemented by the CE device 12 as shown or externally to the CE device.
As described further below, the signals from the microphones can be used to generate HRTF calibration files that are personalized to the person 26 wearing the calibration headphones. A HRTF calibration file typically includes at least one and more typically left ear and right ear FIR filters, each of which typically includes multiple taps, with each tap being associated with a respective coefficient. By convoluting an audio stream with a FIR filter, a modified audio stream is produced which is perceived by a listener to come not from, e.g., headphone speakers adjacent the ears of the listener but rather from relatively afar, as sound would come from an orchestra for example on a stage that the listener is in front of.
To enable end users to access their personalized HRTF files, the files, once generated, may be stored on a portable memory 38 and/or cloud storage 40 (typically separate devices from the CE device 12 in communication therewith, as indicated by the dashed line), with the person 26 being given the portable memory 38 or access to the cloud storage 40 so as to be able to load (as indicated by the dashed line) his personalized HRTF into a receiver such as a digital signal processor (DSP) 41 of playback device 42 of the end user. A playback device may include one or more additional processors such as a second digital signal processor (DSP) with digital to analog converters (DACs) 44 that digitize audio streams such as stereo audio or multi-channel (greater than two track) audio, convoluting the audio with the HRTF information on the memory 38 or downloaded from cloud storage. This may occur in one or more headphone amplifiers 46 which output audio to at least two speakers 48, which may be speakers of the headphones 24 that were used to generate the HRTF files from the test tones. U.S. Pat. No. 8,503,682, owned by the present assignee and incorporated herein by reference, describes a method for convoluting HRTF onto audio signals. Note that the second DSP can implement the FIR filters that are originally established by the DSP 20 of the CE device 12, which may be the same DSP used for playback or a different DSP as shown in the example of
In some implementations, HRTF files may be generated by applying a finite element method (FEM), finite difference method (FDM), finite volume method, and/or another numerical method, using 3D models to set boundary conditions.
In the example shown, the headphones 200 may include one or more wireless transceivers 206 communicating with one or more processors 208 accessing one or more computer storage media 210. The headphones 200 may also include one or more motions sensors communicating with the processor. In the example shown, the headphones 200 include at least one magnetometer 212, at least one accelerometer 214, and at least one gyroscope 216 to establish a nine-axis motion sensor that generates signals representing orientation of the head of the wearer of the headphones 200. U.S. Pat. Nos. 9,448,405 and 9,740,305, owned by the present assignee and incorporated herein by reference, describes a nine-axis orientation measuring system in a head-mounted apparatus.
While all nine axes may be used to determine a head orientation for purposes to be shortly disclosed, in some embodiments, recognizing that sound varies the most as a person moves his head in the horizontal plane, motion in the vertical dimension (and concomitant sensor therefor) may be eliminated for simplicity.
In the example of
In
Moving to block 702, if desired the user may select a virtual venue in which to simulate playing the audio track desired by the user, which is selected at block 704. Head orientation signals from the user's headphones or from another source (such as a camera imaging the user) may be received at block 706, and the corresponding FIR filter from the HRTF files selected for the sensed orientation. When a virtual venue has been selected, at block 708 it is concatenated with the user-personalized FIR filter selected at block 704 corresponding to the user's head orientation and then the concatenation is convoluted with the selected audio track and played.
Note that the logic at block 708 may not use all of the taps of the FIR filter selected at block 706. In some implementations the user may be enabled to select the number of taps to use, it being understood that the greater the number of taps, the better the fidelity but the more burdensome the processing. Or, the playback device 42 may be limited as to how many taps it can process, and therefore may automatically use only some, but not all, of the FIR taps. For example, if a FIR filter has 64 taps but the playback device can process only 32 taps, the playback device may select every other tap in the FIR filter to use, discarding the rest.
As the user may from time to time turn his head, a new orientation is sensed, and a new FIR filter selected from the HRTF file at block 706. Note that if a user's head is at an orientation that itself is not exactly correlated with a FIR filter but hat is between two orientations that are correlated with respective FIR filters, the FIR filter of the orientation closest to the actual orientation may be used. Or, the coefficients of each of “N” corresponding taps of the adjacent FIR filters may be averaged in a weighted manner and a new FIR filter generated on the fly with the averaged coefficients. For example, if the coefficient of the Nth tap of the filter associated with the orientation immediately to the left of the user's current orientation is “A”, the coefficient of the Nth tap of the filter associated with the orientation immediately to the right of the user's current orientation is “A”, and the user's current orientation is exactly midway between the filter orientations, then the coefficient of the Nth tap of a new FIR filter generated on the fly would be (A+B)/2. If the user's current orientation is 40% of the way from the “A” orientation and thus 60% of the way from the “B” orientation, the coefficient of the Nth tap of a new FIR filter generated on the fly would be (0.6A+0.4B).
If desired, the user may be given an option to select HRTF type, e.g., stereo, multi-channel, up-mix from stereo to multichannel, etc. using yet another drop-down list 810 or other selector device. In some embodiments the user may be presented with a tap selector 812 to input the number of FIR filter taps to use consistent with disclosure above.
The taps of a personalized frequency response filter for an audio listening device such as a set of headphones may then be configured for the specific listener at block 904 by employing object-based audio technology to adjust sound characteristics that conform to the listener's hearing capability. The sound compensation algorithm may be derived by using a Discrete Fourier Transform (DFT) or a Fast Fourier Transform (FFT) method for binning the sound frequencies (spectrum of waveforms) of the listener's profile that are then reduced to sound field objects. Each object created is based on a set of frequencies often referred to as a pitch range or unique acoustical sensation to the listener's ear. Each DFT or FFT bin reflects or captures an audio frequency range with related characteristics. An audio sound object can then be adjusted in its base sound elements by passing it through a filter the taps of which are established by the profile to allow for a headphone listener to more accurately hear or discern sounds with more clarity and less distortion.
A left audio channel input feeds audio signals to be played as audio to a series of filters. Ambient noise from a microphone 1002 on a left earphone worn by the listener may be fed into the left audio channel for noise cancellation or ambient sound augmentation as desired. In the example shown, the input 1000 is fed to an impulse response (IR) filter 1004, the taps of which are established by the left ear hearing profile derived as described in reference to
In the example shown, after passing through the filter 1004, the signals representing audio pass through a left ear HRTF filter 1008 derived according to principles set forth above for the particular listener. In this way, signals exiting the HRTF filter 1008 reflect the listener's unique head structure. The signals are played on a left speaker 1010, e.g., the left speaker of headphones.
In other embodiments, instead of being downstream of the IR filter 1004, the HRTF filter 1008 may be upstream such that the output of the HRTF is fed as input to the IR filter 1004.
The filters shown in
While the particular embodiments are herein shown and described in detail, it is to be understood that the subject matter which is encompassed by the present invention is limited only by the claims.
Number | Name | Date | Kind |
---|---|---|---|
7634092 | McGrath | Dec 2009 | B2 |
7720229 | Duraiswami et al. | May 2010 | B2 |
8503682 | Fukui et al. | Aug 2013 | B2 |
8520857 | Fukui et al. | Aug 2013 | B2 |
8787584 | Nyström et al. | Jul 2014 | B2 |
9118991 | Nystrom et al. | Aug 2015 | B2 |
9448405 | Yamamoto | Sep 2016 | B2 |
9740305 | Kabasawa et al. | Aug 2017 | B2 |
9930468 | Christoph | Mar 2018 | B2 |
20160044430 | McGrath | Feb 2016 | A1 |
20160269849 | Riggs et al. | Sep 2016 | A1 |
20170332186 | Riggs et al. | Nov 2017 | A1 |
Number | Date | Country |
---|---|---|
5285626 | Sep 2013 | JP |
2017092732 | May 2017 | JP |
Entry |
---|
“Hear an Entirely New Dimension of Sound”, OSSIC, Retrieved on Oct. 10, 2017 from https://www.ossic.com/3d-audio/. |
Henrik Moller, “Fundamentals of Binaural Technology”, Acoustics Laboratory, Aalborg University, Mar. 3, 1992, Aalborg, Denmark. |
Sylvia Sima, “HRTF Measurements and Filter Design for a Headphone-Based 3D-Audio System”, Faculty of Engineering and Computer Science, Department of Computer Science, University of Applied Sciences, Hamburg, Germany, Sep. 6, 2008. |
James R. Milne, Gregory Peter Carlsson, “Personalized End User Head-Related Transfer Function (HRTV) Finite Impulse Response (FIR) Filter”, file history of related U.S. Appl. No. 15/822,473, filed Nov. 27, 2017. |