Limitations and disadvantages of conventional approaches to interpolate a head-related transfer function (HRTF) will become apparent to one of skill in the art, through comparison of such approaches with some aspects of the present method and system set forth in the remainder of this disclosure with reference to the drawings.
A system and method for interpolating a head-related transfer function is provided substantially as illustrated by and/or described in connection with at least one of the figures, as set forth more completely in the claims.
The perception of 3D sound can be obtained with headphones, thru the use of Head-Related Transfer Functions (HRTFs). An HRTF comprises a pair of filters (one for the left ear, another for the right ear) that when applied to a particular sound, give you the sense of a sound coming from a particular direction. To implement such systems, an HRTF dataset may be used. An HRTF dataset comprises a plurality of filter pairs, where each filter pair may correspond to a different direction.
This disclosure describes a system and method for HRTF interpolation when an HRTF dataset does not contain a particular direction. The disclosed HRTF interpolation uses a finite set of HRTFs from a dataset, to obtain the HRTF of any possible direction and distance, even if the direction/distance doesn't exist on the current dataset.
The method for HRTF interpolation may be performed in spherical coordinates on the angle/direction (e.g., azimuth and elevation) of the sound source and the distance of the sound source.
An HRTF dataset of Head-Related Impulse Responses (HRIRs) may be generated for different azimuths and elevations.
The sphere mesh 201 is generated (for instance, using a convex-hull method), using the azimuths and the elevations recorded in the dataset. This sphere mesh 201 comprises triangles (e.g., triangle section 205). Each vertex in each triangle corresponds to a position (azimuth and elevation) in the original HRTF dataset. For simulating a sound being emitted by an audio source from an audio source location 203, the triangle section 205 pertaining to that audio source position is identified.
While this disclosure is illustrated with single point sources that intercept the triangle at a single point, larger sound sources (that overlap multiple triangles) may also be used. While this disclosure is illustrated with a sphere mesh comprising triangles, other sphere meshes (comprising, for example, quadrilaterals) may also be used.
A weight (W1, W2, and W3) of each triangle vertex is generated to triangulate the audio source location 203 within that triangle 205. These weights can be obtained using Vector-Base Amplitude Panning, or similar methods.
Each HRIR in the dataset may have a different initial delay (i.e., they may not be aligned to each other in the time domain). The proposed method implements an initial alignment in the time domain.
Before alignment, the HRIRs 301a and 303a are converted to a higher sample rate (e.g. ×1,000 times more resolution using a resampling algorithm). The time deviation (shift) between each HRIR 301a and 303a is determined by correlation. The HRIRs 301a and 303a may be aligned by padding zeros at the beginning or at the end, as necessary, or eventually by removing samples. For example, in
The time alignment can either be done within each triangle (aligning the 3 vertices of a triangle), but it can also be done globally (alignment of all vertices between them).
The weights (W1, W2, and W3) are applied to the HRIRs of each vertex to compute the interpolated HRIR at the desired audio source location 203.
The weights (W1, W2, and W3) are also used to affect the time shift of the obtained HRIR. As such, the final time shift will be given by the weighted average of the time shifts computed during the alignment process.
After alignment, the obtained HRIR is converted back to the original sample rate (e.g. using a resampling algorithm).
To interpolate distance from a sound source, a different azimuth (az) and/or elevation (el) is selected for each ear.
For processing, an example head width of 18 cm (9 cm to each side of the center) may be considered. Although any head size may be substituted.
At 703, a triangulation is performed over an HRTF sphere.
At 705, each HRIR is upsampled to obtain a higher quality.
At 707, for each triangle, the HRIR of all vertices are aligned in the time domain.
At 709, the system gets the position of the sound source in relation to the left ear position. For the left ear position: the system identifies which triangle contains the desired direction at 711; an impulse response is obtained with a weighted version of all vertices HRIRs at 713 (the weights are also used to apply a time offset to the obtained HRIR; and the HRIR is downsampled, to the original sample rate at 715. Note, for a point on the HRTF sphere that falls exactly on an edge or vertex of multiple triangle sections, only one triangle section needs to be considered. Then, the weight of each vertex is calculated.
At 717, the system gets the position of the sound source in relation to the right ear position. For the right ear position: the system identifies which triangle contains the desired direction at 719; an impulse response is obtained with a weighted version of all vertices HRIRs at 721 (the weights are also used to apply a time offset to the obtained HRIR; and the HRIR is downsampled, to the original sample rate at 723. Note, for a point on the HRTF sphere that falls exactly on an edge or vertex of multiple triangle sections, only one triangle section needs to be considered. Then, the weight of each vertex is calculated.
While the present system has been described with reference to certain implementations, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present system. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present disclosure without departing from its scope. Therefore, it is intended that the present method and/or system not be limited to the particular implementations disclosed, but that the present system will include all implementations falling within the scope of the appended claims.
As utilized herein the terms “circuits” and “circuitry” refer to physical electronic components (i.e. hardware) and any software and/or firmware (“code”) which may configure the hardware, be executed by the hardware, and or otherwise be associated with the hardware. As used herein, for example, a particular processor and memory may comprise first “circuitry” when executing a first one or more lines of code and may comprise second “circuitry” when executing a second one or more lines of code. As utilized herein, “and/or” means any one or more of the items in the list joined by “and/or”. As an example, “x and/or y” means any element of the three-element set {(x), (y), (x, y)}. In other words, “x and/or y” means “one or both of x and y”. As another example, “x, y, and/or z” means any element of the seven-element set {(x), (y), (z), (x, y), (x, z), (y, z), (x, y, z)}. In other words, “x, y and/or z” means “one or more of x, y and z”. As utilized herein, the term “exemplary” means serving as a non-limiting example, instance, or illustration. As utilized herein, the terms “e.g.,” and “for example” set off lists of one or more non-limiting examples, instances, or illustrations. As utilized herein, circuitry is “operable” to perform a function whenever the circuitry comprises the necessary hardware and code (if any is necessary) to perform the function, regardless of whether performance of the function is disabled or not enabled (e.g., by a user-configurable setting, factory trim, etc.).
Number | Date | Country | |
---|---|---|---|
Parent | 17474734 | Sep 2021 | US |
Child | 18677171 | US |