Embodiments of the invention relate to a method and device for adaptive Head Related Transfer Function (HRTF) individualization.
Spatially accurate binaural virtualization of sound requires delivering to the listener audio that has been processed to contain the expected binaural cues that allow localizing sound sources at intended locations. Such cues, which include the ITD (interaural time difference), ILD (interaural level difference) and spectral cues, which are represented by the head-related transfer function (HRTF), are highly individualized as they significantly vary between individuals. Any mismatch between the HRTF used to process the audio and the actual HRTF of the individual listener can lead to errors in localization and degradation in the spatial realism of the binaural reproduction. The aim of HRTF individualization is to reduce this mismatch. Standard methods for obtaining individualized HRTFs include acoustic measurements of the individual's HRTF in an anechoic or semi-anechoic environment, and accurate numerical solution of the Helmholtz equation over a grid covering a representation of the individual's head. Such HRTF individualization methods have the disadvantage of being time-consuming, costly (as they require special equipment and acoustic environments or CPU and RAM intensive numerical solvers) and often impractical, as they require acoustical or morphological measurements involving the actual individual. The present invention allows individualizing an HRTF (or the subset of it required to virtualize a subset of spatial locations) without the above mentioned disadvantages by allowing the individual listener to calibrate the binaural rendering system by using a single controller (e.g. a slider in graphical user interface) to cycle through a composite and generic HRTF, constructed according to a prescription defined by the invented method, and selecting the individual filter(s) that best virtualize the sound at the intended location(s).
Embodiments of the invention relate to a method and/or device for adaptive Head Related Transfer Function (HRTF) individualization. The Adaptive HRTF Individualizer (AHI) allows tailoring (individualizing) an HRTF for a listener through a calibration process that relies on the use of a single controller (e.g. a slider in a graphical user interface) that allows cycling through a specially pre-processed composite HRTF and selecting and storing the filter that best virtualize sound at desired spatial locations. The selected filters are then used in any standard binaural rendering system (e.g. headphones, earphones, crosstalk-canceled speakers) to yield spatially accurate sound virtualization (e.g. the virtualization of the speakers of a surround sound system). The composite HRTF is constructed from an appropriately selected set of measured, calculated or synthesized HRTFs, which are deconstructed and processed in such a way as to retain a wide range of spectral cues, and enable smooth interpolation of the filters in the time domain prior to the judicious addition of interaural time difference (ITD) and interaural level difference (ILD). Expanding this procedure for a single location into multiple locations enables a customized and accurate listening experience for multichannel content virtualized through headphones. This can be done for a multitude of speaker locations to customize and render any surround sound content.
The processing in embodiments of the subject adaptive HRTF individualizer (AHI) can be constructed starting with measured, calculated, or synthesized HRTFs, from here on called the “original HRTFs”, that are pre-processed, and then further modified. The HRTFs used can be from a variety of sources including public databases such as: CIPIC HRTF Database (https://www.ece.ucdavis.edu/cipic/spatial-sound/hrtf-data/), IRCAM Listen HRTF Database (http://recherche.ircam.fr/equipes/salles/listen/), a private collection of HRTF sets, calculated HRTFs and/or synthesized HRTFs.
The HRTF datasets, independent of their source, can be processed in accordance with the subject AHI, as described herein.
Preferably, pre-processing can be performed on the HRTFs, depending on the collection method used to generate the HRTFs. Preferably, all filters used as the basis to construct parameterized HRTFs for use in embodiments of the AHI are processed for consistency, and specific ILD and ITD cues inherent within measured HRTFs are preferably removed while retaining a wide range of spectral cues. This enables a smooth interpolation procedure over portions of, or the entirety of, the HRTF datasets and the ability to reconstruct synthesized cues in a controlled manner.
One or more of the following individual steps can be utilized in pre-processing the original HRTF datasets, typically in the time domain:
In a specific embodiment of the subject AHI, the pre-processing can include all 6 of the above-described steps. In a specific embodiment, the pre-processing is accomplished in the time domain.
Synthetization and Reconstruction of Cues
After pre-processing, which can optionally be performed at measurement of the original HRTF filters, synthetization and reconstruction of cues can then be accomplished. In a specific embodiment, once all filters are deconstructed and processed into a common baseline filter, an additional processing procedure interpolates through a HRTF space and introduces interaural time difference (ITD) and interaural level difference (ILD) in a linear fashion to enable smooth parameter mapping to an appropriate interface, such as a single slider. The ITD and ILD can be introduced using established equations and models that can optionally be optimized or corrected, e.g., based on the current state of research. As the localization cues in HRTF differ widely across the human population and can be constrained to a narrower space to accommodate a specific market.
An embodiment of the processing procedure can be implemented in accordance with one or more of the following:
A more detailed description of these features are as follows:
Filter Order
Prior to the filters being interpolated and reconstructed into a final dataset per localized source, they can be analyzed and ordered. This is done to ensure smooth transition through the interpolation process that covers the entire “space” of HRTF for the intended application. This ordering process can be done a variety of ways, such as via one or more of the following:
The number of steps between point A and B in the set of reconstructed filters can vary depending on the datasets, the target number of filters, and the target smoothness of filter transitions.
In an embodiment, the filters are aligned in the time domain, through a process of determining the “delay” in a filter, such as the time adjustment described in minimum-phase conversion and/or output validation. This delay is used to calculate the start of an impulse via standard thresholding and averaging methods.
In an embodiment, the interpolation is a linear weighted interpolation over a number of steps, N.
ITD Reconstruction
Additional ITD models can be used to generate ITDs for point sources for any desired location, based on the azimuth and elevation of the point source to be virtualized.
Once ITDs are calculated for the location in question for a range of head sizes (this range can vary depending upon the target market sector or population), the ITDs are reconstructed and applied individually to each unique HRTF within the interpolated dataset. This can be done in a manner that allows for a smooth transition of ITDs before moving to the next interpolated filter.
Individualization
The reconstructed filters in their respective order for each location can be cycled through by the user while listening to any test signal processed through the filters with the goal of virtualizing the sound source at a desired location (e.g. the location of a certain speaker in a virtual 5.1 surround system). When the desired location is virtualized accurately or acceptably the selected filter is stored. The process can be repeated for another desired location of virtualized sound, which may correspond to a different selected filter. The subset of such selected and stored filters represents the subset of filters to be loaded in a binaural rendering/playback system (e.g. headphones or speakers with crosstalk cancellation), and used to process (e.g. through convolution) the audio to enable the listener to perceive sound sources at the intended spatial locations.
Embodiment 1. A method for head related transfer function (HRTF) individualization, comprising:
Embodiment 2. The method according to embodiment 1,
Embodiment 3. The method according to any preceding embodiments,
Embodiment 4. The method according to any preceding embodiments,
Embodiment 5. The method according to embodiment 4,
Embodiment 6. The method according to any preceding embodiments,
Embodiment 7. The method according to embodiment 6,
Embodiment 8. The method according to embodiment 7, further comprising:
Embodiment 9. The method according to any preceding embodiments, further comprising:
Embodiment 10. The method according to embodiment 9, further comprising:
Embodiment 11. A device for head related transfer function (HRTF) individualization, comprising:
Embodiment 12. The device according to embodiment 11,
Embodiment 13. The device according to embodiment 12,
Embodiment 14. The device according to any of preceding embodiments 11-13,
Embodiment 15. The device according to embodiment 14,
Embodiment 16. The device according to any of preceding embodiments 11-15,
Embodiment 17. The device according to embodiment 16,
Embodiment 18. The device according to embodiment 17,
Embodiment 19. The device according to any of any preceding embodiments 11-18,
Embodiment 20. One or more non-transitory computer-readable media having computer-readable instructions embodied thereon for performing a method for head related transfer function (HRTF) individualization,
Embodiment 21. A method of processing a set of N “original” HRTF filters, comprising:
Embodiment 22. The method according to embodiment 21,
Embodiment 23. The method according to embodiment 22,
Embodiment 24. The method according to any of embodiments 21-23,
Embodiment 25. The method according to embodiment 24,
Embodiment 26. The method according to embodiment 21,
Embodiment 27. The method according to embodiment 26,
Embodiment 28. The method according to embodiment 27, further comprising:
Aspects of the invention, such as obtaining the original HRTFs, processing such HRTFs, filtering audio signals through the processed HRTFs, and rendering the resulting audio through headphones or crosstalk-canceled speakers, based on such processed audio files, may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with a variety of computer-system configurations, including multiprocessor systems, microprocessor-based or programmable-consumer electronics, minicomputers, mainframe computers, and the like. Any number of computer-systems and computer networks can be used with the present invention.
Specific hardware devices, programming languages, components, processes, protocols, and numerous details including operating environments and the like are set forth to provide a thorough understanding of the present invention. In other instances, structures, devices, and processes are shown in block-diagram form, rather than in detail, to avoid obscuring the description of the present invention. But an ordinary-skilled artisan would understand that the present invention may be practiced without these specific details. Computer systems, servers, workstations, and other machines may be connected to one another across a communication medium including, for example, a network or networks.
As one skilled in the art will appreciate, embodiments of the present invention may be embodied as, among other things: a method, system, or computer-program product. Accordingly, the embodiments may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. In an embodiment, the present invention takes the form of a computer-program product that includes computer-useable instructions embodied on one or more computer-readable media.
Computer-readable media include both volatile and nonvolatile media, transient and non-transient media, removable and nonremovable media, and contemplate media readable by a database, a switch, and various other network devices. By way of example, and not limitation, computer-readable media comprise media implemented in any method or technology for storing information. Examples of stored information include computer-useable instructions, data structures, program modules, and other data representations. Media examples include, but are not limited to, information-delivery media, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), holographic media or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage, and other magnetic storage devices. These technologies can store data momentarily, temporarily, or permanently.
The invention may be practiced in distributed-computing environments where tasks are performed by remote-processing devices that are linked through a communications network. In a distributed-computing environment, program modules may be located in both local and remote computer-storage media including memory storage devices. The computer-useable instructions form an interface to allow a computer to react according to a source of input. The instructions cooperate with other code segments to initiate a variety of tasks in response to data received in conjunction with the source of the received data.
The present invention may be practiced in a network environment such as a communications network. Such networks are widely used to connect various types of network elements, such as routers, servers, gateways, and so forth. Further, the invention may be practiced in a multi-network environment having various, connected public and/or private networks.
Communication between network elements may be wireless or wireline (wired). As will be appreciated by those skilled in the art, communication networks may take several different forms and may use several different communication protocols. And the present invention is not limited by the forms and communication protocols described herein.
The examples and embodiments described herein are for illustrative purposes only and various modifications or changes in light thereof will be apparent to persons skilled in the art and are included within the spirit and purview of this application. In addition, any elements or limitations of any invention or embodiment thereof disclosed herein can be combined with any and/or all other elements or limitations (individually or in any combination) or any other invention or embodiment thereof disclosed herein, and all such combinations are contemplated with the scope of the invention without limitation thereto.
All patents, patent applications, provisional applications, and publications referred to or cited herein (including those in the “References” section) are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
This application claims the benefit of U.S. Provisional Application Ser. No. 63/340,141, filed May 10, 2022, which are incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
20170245082 | Boland | Aug 2017 | A1 |
20170325045 | Baek et al. | Nov 2017 | A1 |
20200021939 | Oland et al. | Jan 2020 | A1 |
20200186951 | Audfray et al. | Jun 2020 | A1 |
20220030373 | Mehta | Jan 2022 | A1 |
Number | Date | Country |
---|---|---|
10-2019-0075807 | Jul 2019 | KR |
Entry |
---|
Algazi, V. R. et al. “The Cipic Hrtf Database.” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 99-102, (Year: 2001). |
Warusfel, Olivier “Listen HRTF Database”, Jun. 9, 2002, URL: http://recherche.ircam.fr/equipes/salles/listen/, Accessed: Jun. 6, 2023. |
Number | Date | Country | |
---|---|---|---|
20230370800 A1 | Nov 2023 | US |
Number | Date | Country | |
---|---|---|---|
63340141 | May 2022 | US |