This application relates to systems and methods for head and car tracking, and more specifically, to reconstruction of interaural time difference using a head diameter.
Headrest audio systems, seat or chair audio systems, sound bars, vehicle audio systems, and other personal and/or near-field audio systems are gaining popularity. However, the sound experienced by a user of a personal and/or near-field audio system can vary significantly (e.g., 3-6 dB or another value) when a listener moves their head, even very slightly. In the example of headrest audio systems, depending on how the user is positioned in a seat and how the headrest is adjusted, the variation from one person using the audio system to another person can also vary significantly. This level of sound pressure level (SPL) variability makes tuning audio systems difficult. Furthermore, when rendering spatial audio over headrest speakers, this variability causes features like crosstalk cancellation to fail.
One way of correcting audio for personal and/or near-field audio systems is the application of head-related transfer functions (HRTFs) to synthesize a binaural sound that seems to come from a particular point in space. Specifically, a pair of HRTFs (one for each ear of a listener) are applied to an audio signal to produce a desired sound localization. For example, various consumer entertainment systems have been designed to reproduce surround sound via stereo headphones headrest audio systems using HRTFs. Some forms of HRTF processing have also been included in computer software to simulate surround sound playback from loudspeakers.
A significant problem with conventional HRTF-based sound localization schemes is that generic HRTFs commonly employed in consumer devices rely on an embedded interaural time difference (ITD) value that is unlikely to match the actual ITD of a specific listener. With an incorrect ITD value, an HRTF incorrectly transforms an audio signal for that specific listener. As a result, conventional HRTF-based sound localization schemes often cannot synthesize finer gradations in perceived directionality of a sound. Instead, the perceived location of a sound produced using such a scheme may be limited to either directly to the front of the listener or directly to the side of the listener, resulting in a low-quality listener experience. In theory, listener-specific information, such as head geometry values, can be provided to a personal and/or near-field audio system to improve sound localization produced for a particular listener by that system. However, in the context of commercial audio products, relying on each new listener to accurately measure and input head size and/or ear location is generally unworkable.
As the foregoing illustrates, what is needed in the art is improved techniques for sound localization of virtual sound sources produced by audio systems.
One embodiment of the present disclosure sets forth a method that includes receiving head geometry information for a user, determining a calculated interaural-time-delay (ITD) value for the user based on the head geometry information, generating a first modified head-related transfer function (HRTF) with the calculated ITD value and a second modified HRTF with the calculated ITD value, generating a first modified audio signal with the first modified HRTF and a second modified audio signal with the second modified HRTF, and transmitting the first modified audio signal and the second modified audio signal to one or more loudspeakers.
At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, sound localization of a virtual sound source produced by an HRTF-based sound localization scheme is improved for any listener. The improved sound-localization provides a more three-dimensional audio listening experience to listeners for personal and/or near-field audio systems such as stereo headphones, headrest audio systems, seat/chair audio systems, sound bars, vehicle audio systems, and/or the like. These technical advantages represent one or more technological improvements over prior art approaches.
So that the manner in which the above recited features of the various embodiments can be understood in detail, a more particular description of the inventive concepts, briefly summarized above, may be had by reference to various embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of the inventive concepts and are therefore not to be considered limiting of scope in any way, and that there are other equally effective embodiments.
In the following description, numerous specific details are set forth to provide a more thorough understanding of the various embodiments. However, it will be apparent to one of skilled in the art that the inventive concepts may be practiced without one or more of these specific details.
Memory 114 stores, without limitation, a head diameter estimator 120, a filter modification module 130, a binarual renderer 140 and a plurality of base head-related transfer functions (HRTFs) 150. In the embodiment illustrated in
In various embodiments, computing device 110 is included in a vehicle system, a home theater system, a soundbar, stereo headphones, and/or the like. In some embodiments, computing device 110 is included in one or more devices, such as consumer products (e.g., portable speakers, gaming, etc. products), vehicles (e.g., the head unit of an automobile, truck, van, etc.), smart home devices (e.g., smart lighting systems, security systems, digital assistants, etc.), communications systems (e.g., conference call systems, video conferencing systems, speaker amplification systems, etc.), and the like. In various embodiments, computing device 110 is located in various environments including, without limitation, indoor environments (e.g., living room, conference room, conference hall, home office, etc.), and/or outdoor environments, (e.g., patio, rooftop, garden, etc.). Computing device 110 is also able to provide audio signals (e.g., generated using binaural renderer 140) to loudspeakers 160 to generate a sound field that provides various audio effects.
Processing unit 112 can be any suitable processor, such as a central processing unit (CPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), and/or any other type of processing unit, or a combination of different processing units, such as a CPU configured to operate in conjunction with a GPU and/or a DSP. In general, processing unit 112 can be any technically feasible hardware unit capable of processing data and/or executing software applications.
Memory 114 can include a random-access memory (RAM) module, a flash memory unit, or any other type of memory unit or combination thereof. Processing unit 112 is configured to read data from and write data to the memory 114. In various embodiments, memory 114 includes non-volatile memory, such as optical drives, magnetic drives, flash drives, or other storage. In some embodiments, separate data stores, such as external data stores included in a network (“cloud storage”), can supplement memory 114. In some embodiments, memory 114 stores, without limitation, head diameter estimator 120, face detector 122, head orientation estimator 124, depth estimator 126, landmark-to-ear transformation module 128, filter modification module 130, binaural renderer 140 and HRTFs 150.
Loudspeakers 160 include various speakers for outputting audio to create the sound field or the various audio effects in the vicinity of the user. In some embodiments, loudspeakers 160 include two or more speakers located in a headrest of a seat such as a vehicle seat or a gaming chair, or another user-specific speaker set connected or positioned for use by a single user, such as a personal and/or near-field audio system. In some embodiments, loudspeakers 160 are associated with a speaker configuration stored in the memory 114. The speaker configuration indicates locations and/or orientations of loudspeakers 160 in a three-dimensional space and/or relative to one another and/or relative to a vehicle, a vehicle seat, a gaming chair, a location of imagers 172, and/or the like. In some embodiments, binaural renderer 140 can retrieve or otherwise identify the speaker configuration of loudspeakers 160.
Each loudspeaker 160 provides a sound output by reproducing a respective received audio signal. In some embodiments, loudspeakers 160 can be components of a wired or wireless speaker system, or any other device that generates a sound output. In some embodiments, loudspeakers 160 can be connected to output devices that additionally provide other forms of outputs, such as display devices that provide visual outputs. Each loudspeaker 160 of audio processing system 100 can be any technically feasible type of audio outputting device. For example, in some embodiments, each loudspeaker 160 includes one or more digital speakers that receive an audio signal in a digital form and convert the audio output signals into air-pressure variations or sound energy via a transducing process.
Head geometry sensors 170 generate head geometry information 176 for a user of audio processing system 100. In the embodiment illustrated in
The one or more imagers 172 can include, without limitation, various types of cameras for capturing two-dimensional images of the user. In some embodiments, imagers 172 include a camera of a driver monitoring system (DMS) positioned within a vehicle or included in sound bar, a web camera, and/or the like. In some embodiments, imagers 172 include only a single standard two-dimensional imager without stereo or depth capabilities, while in other embodiments, imagers 172 include multiple cameras, such as a stereo imaging system.
The one or more accelerometers 174 can provide position and/or orientation information associated with a head of a user that can facilitate determination of a diameter of a user head by head diameter estimator 120. For example, in some embodiments, one or more accelerometers 174 can be disposed within a stereo headphone system worn by the user to provide inertial and/or orientational information associated with the head of the user. In some embodiments, accelerometers 174 can include, without limitation, an inertial measurement unit (IMU) (e.g., a three-axis accelerometer, gyroscopic sensor, and/or magnetometer).
Geometry information 176 includes, without limitation, information generated by head geometry sensors 170 that indicates the geometry of a head of a user. For example, in some embodiments geometry information 176 includes, without limitation, two-dimensional (2D) digital images of the head of a user, distance or range measurements associated with the head of the user, 3D contour information of the head of the user, and the like.
In operation, audio processing system 100 processes head geometry information 176 captured using one or more head geometry sensors 170 to estimate a head diameter for a user via head diameter estimator 120. The head diameter is provided to filter modification module 130 to calculate a more accurate interaural time difference (ITD) value for the user than that included in HRTFs 150 stored in memory 114. Filter modification module 130 can then generate modified HRTFs based on the calculated ITD value, and binaural renderer 140 applies the modified HRTFs to appropriately modify an audio signal that accurately synthesizes a binaural sound or other spatial/positional audio effects for the user. Thus, using the camera-based system, head diameter estimator 120 provides binaural renderer 140 with the information usable to reconstruct a corrected ITD without any user interaction, thereby improving the localization of virtual sources produced by audio system 100.
Head diameter estimator 120 determines a head diameter for a user of audio processing system 100, such as a listener wearing stereo headphones, a passenger in a vehicle equipped with headrest speakers, a gamer using a gaming chair, and/or the like. Head diameter estimator 120 can determine the head diameter using any technically feasible approach, including computer vision, 3D mapping, stereoscopic imaging, and the like. In the embodiments described below, head diameter estimator 120 determines the head diameter using the outputs of face detector 122, head orientation estimator 124, depth estimator 126, and landmark-to-ear transformation module 128. For example, in some embodiments, based on user-specific ear locations determined by landmark-to-ear transformation module 128, head diameter estimator 120 can determine a head diameter for the user. In other embodiments, head diameter estimator 120 determines the head diameter using any other suitable approach.
Face detector 122 of head diameter estimator 120 includes a machine-learning model, a rule-based model, or another type of model that receives head geometry information 176 as input and generates 2D landmarks. For example, in some embodiments, face detector 122 generates 2D landmark coordinates based on one or more 2D images included in head geometry information 176. The 2D landmark coordinates are 2D locations for one or more anthropomorphic landmarks associated with the head of the user. Embodiments of various 2D landmarks and landmark coordinates are described below in conjunction with
Returning to
Depth estimator 126 generates landmark depth estimates for respective ones and/or pairs of the 2D landmark coordinates 210 generated by face detector 122. A pair of 2D landmark coordinates can include a bridge-to-chin pair, a glabella-to-chin pair, a glabella-to-nasal-base pair, or another pair that is primarily vertical (e.g., having a greatest difference between the coordinates in a vertical dimension). A pair of 2D landmark coordinates can include an eye-to-eye pair, a jaw-to-jaw pair, or another pair that is primarily horizontal (e.g., greatest difference between the coordinates is in a horizontal dimension). However, any landmark pair can be used. Accuracy is increased for landmark pairs separated by a greater distance. As a result, the bridge-to-chin pair or the glabella-to-chin pair can provide greater accuracy in some embodiments. In some embodiments, depth estimator 126 uses one or more of the head orientation vector, 2D landmark coordinates, and head geometry information 176 to generate landmark depth estimates. In such embodiments, the landmark depth estimates can be considered a scaling factor that scales 3D landmark coordinates of anthropomorphic landmarks 210. In such embodiments, depth estimator 126 generates the landmark depth estimates based on a focal length of a camera that captured certain images included in head geometry information 176, the distance between a pair of 2D landmark coordinates, and the distance between a pair of two-dimensional landmark coordinates in a 2D image. In some embodiments, such distances in an image can be indicated in a number of pixels, and/or can be generated by multiplying the number of pixels by a physical width of each pixel. Alternatively, in some embodiments, depth estimator 126 generates landmark depth estimates for respective ones and/or pairs of the 2D landmark coordinates 210 based on 3D information included in head geometry information 176.
Landmark-to-ear transformation module 128 generates user-specific ear locations based on the 3D landmark coordinates determined by depth estimator 126, by extracting 3D location information for the cars of the user directly from head geometry information 176, or by reconstructing a 3D model of the head of the user (for example via computer vision or other image processing).
Each HRTF 150 is a direction-dependent filter that describes the acoustic filtering (modifications to a sound) by at least the head, torso, and outer ears (pinna) of a user and enables audio processing system 100 to perform binaural reproduction of an audio signal. In particular, HRTFs 150 provide cues to the user for the localization and externalization of virtual sound sources presented via loudspeakers 160, thereby synthesizing a binaural sound that the user perceives to originate from a particular point in space. With a plurality of direction-specific HRTFs 150, a virtual sound source from an arbitrary direction can be presented to the user via so called virtual auditory displays.
In operation, binaural pairs of HRTFs 150 are employed to enable the localization of a perceived sound source (for example, in the horizontal plane) via binaural renderer 140 and loudspeakers 160. Specifically, for a specific azimuthal direction, binaural renderer 140 employs a binaural pair of HRTFs that includes a first HRTF 150 for the left ear of the user and a second HRTF 150 for the right ear of the user. Thus, the first HRTF 150 approximates the filtering of a sound source before being perceived at the left ear of the user and the second HRTF 150 approximates the filtering of a sound source before being perceived at the right ear of the user. HRTFs 150 are well-known in the art and can be readily generated by one of skill in the art for a plurality of directions, for example in an anechoic chamber. HRTFs 150 are described in greater detail below in conjunction with
Sound impulse 330 follows a first sound propagation path 332 to microphone 312, which is on the left side of generic user head 310, and a second sound propagation path 334 to microphone 314, which is on the right side of generic user head 310. Because sound source 304 is positioned at an increment of azimuthal angle 306 that is not directly in front of or directly behind generic user head 310, first sound propagation path 332 is different than second sound propagation path 334. As a result, a time of arrival (TOA) of sound impulse 330 at first microphone 312 is different than the TOA of sound impulse 330 at second microphone 314. Thus, there is a non-zero ITD between the first HRTF 150 (generated for the left ear of the user) and the second HRTF 150 (generated for the left ear of the user). The ITD between the first HRTF 150 and the second HRTF 150 is described below in conjunction with
As shown, first impulse response 410 includes a first TOA 412, and second impulse response 420 includes a second TOA 422 that occurs before TOA 412. This is because microphone 314 (which is used to generate first impulse response 410) is closer to sound source 304 than microphone 312 (which is used to generate second impulse response 420). As a result, there is an ITD 450 between first impulse response 410 and second impulse response 420. In general, there is a different value for ITD 450 for each direction of a sound source from the user. In addition, the value of ITD 450 is a function of head diameter 350 of generic user head 310 (shown in
In the embodiment illustrated in
It is noted that the ability of the binaural pair of HRTFs 150 that are associated with first impulse response 410 and second impulse response 420 to accurately synthesize a binaural sound for a particular user depends on various user-specific factors. One such factor is how closely the HRTF 150 associated with first impulse response 410 matches the filtering characteristics of the left ear of that particular user and how closely the HRTF 150 associated with second impulse response 420 matches the filtering characteristics of the right ear of that particular user. Another factor is how close head diameter 350 matches the actual head diameter of that particular user. Because head diameter 350 is used to generate the binaural pair of HRTFs 150, and because head diameter 350 is unlikely to be identical to the head diameter of the particular user, the binaural pair of HRTFs 150 generally cannot be used to accurately synthesize a binaural sound for a given user. According to various embodiments, the binaural pair of HRTFs 150 are modified so that ITD 450 (which is based on generic user head 310) is replaced with a calculated ITD that is based on the user head diameter determined by head diameter estimator 120 of
Returning to
In addition, filter modification module 130 generates a pair of modified HRTFs based on the generic HRTFs 150 that are indicated to be used for synthesizing binaural sound for the current user. For example, based on the calculated ITD for the current user, filter modification module 130 generates a first modified HRTF for the left ear of the user and a second modified HRTF for the right ear of the user. According to various embodiments, filter modification module 130 modifies the binaural pair of generic HRTFs 150 by removing the embedded ITD that exists between the HRTF 150 for the user left ear and the HRTF 150 for the user right, then further modifies the pair of generic HRTFs 150 so that the calculated ITD is present therebetween. Embodiments of the modification of the binaural pair of generic HRTFs 150 is described in greater detail below in conjunction with
In the embodiment described above in conjunction with
In the embodiment illustrated in
Returning to
As shown, a method 700 begins at step 702, where audio processing system 100 collects head geometry information for a particular user of audio processing system 100. For example, in an embodiment in which audio processing system 100 is implemented as a stereo headphone system, when the user dons the stereo headphone system, head geometry information is collected via one or more accelerometers 174, size-setting indicators, and/or pressure sensors included in the stereo headphone system and/or sensors (e.g., one or more imagers 712) external to the stereo headphone system. In another example, in embodiments in which audio processing system 100 is implemented as a headrest audio system, when the user occupies a seat associated with the headrest audio system, one or more head geometry sensors 170 (e.g., driver management system cameras) collect certain head geometry information, for example by capturing 2D image data and/or 3D contour data for the head of the user.
At step 704, head diameter estimator 120 determines the head diameter of the user based on the head geometry information collected in step 702. In some embodiments, head diameter estimator 120 determines the head diameter based on a 3D position of each ear of the user. For example, in some embodiments head diameter estimator 120 uses face detector 122, head orientation estimator 124, depth estimator 126, and/or landmark-to-ear transformation module 128 to process 2D images of the head of the user to determine the position of each ear of the user. In other embodiments, head diameter estimator 120 determines the position of each ear of the user using computer vision and/or 3D contour information to reconstruct a 3D position of each ear of the user. Additionally or alternatively, in some embodiments, head diameter estimator 120 determines the head diameter based on an orientation of the head of the user, for example as determined by head orientation estimator 124. Additionally or alternatively, in some embodiments, head diameter estimator 120 determines the head diameter based on one or more anthropomorphic landmarks on the head of the user, for example as determined by landmark-to-ear transformation module 128.
At step 706, filter modification module 130 determines a calculated ITD based on the user head diameter determined in step 704. In some embodiments, filter modification module 130 calculates an ITD for the head of the user based on one or more head-geometry models and the head diameter of the user.
At step 708, filter modification module 130 generates a pair of modified HRTFs based on the generic HRTFs 150 that are indicated to be used for synthesizing binaural sound for the current user. For example, in one embodiment, filter modification module 130 generates a first modified HRTF and a second modified HRTF using the calculated ITD from step 706. In some embodiments, filter modification module 130 generates the first modified HRTF by changing a first time-of-arrival value of a first HRTF 150 to a second time-of-arrival value, and generates the second modified HRTF by retaining a third time-of-arrival value of a second HRTF 150 at a same value, as described above in conjunction with
At step 710, binaural renderer 140 generates a first modified audio signal for a first loudspeaker 160 and a second modified audio signal for a second loudspeaker 160 based on an audio input signal. For example, in some embodiments, the first modified audio signal is associated with a left ear of the current user and the second modified audio signal is associated with a right ear of the current user. In such embodiments, binaural renderer 140 generates the first modified audio signal with the first modified HRTF generated in step 708, which is associated with the left ear of the user. Similarly, binaural renderer 140 generates the second modified audio signal with the second modified HRTF generated in step 708, which is associated with the right ear of the user.
At step 712, binaural renderer 140 transmits the first modified audio signal to the first loudspeaker 160 and the second modified audio signal to the second loudspeaker 160. Method 700 then returns to step 702, where head geometry information is again collected by audio processing system 100.
In sum, techniques are disclosed for producing user-specific sound localization in an audio processing system. In some embodiments, various head geometry sensors are employed to estimate a diameter of a user head in real time. Based on the estimated diameter of the user head, a calculated ITD is determined and used to modify a binaural pair of HRTFs to be more accurately user-specific and thereby more accurately localize a virtual sound source perceived by the user. The modified binaural pair of HRTFs are then used to filter to an audio signal in order to generate a spatialized sound field.
At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, sound localization of a virtual sound source produced by an HRTF-based sound localization scheme is improved for any listener. The improved sound-localization provides a more three-dimensional audio listening experience to listeners for personal and/or near-field audio systems such as stereo headphones, headrest audio systems, seat/chair audio systems, sound bars, vehicle audio systems, and/or the like. This technical advantage represents one or more technological improvements over prior art approaches.
Aspects of the disclosure are also described according to the following clauses.
1. In some embodiments, a computer-implemented method includes: receiving head geometry information for a user; determining a calculated interaural-time-delay (ITD) value for the user based on the head geometry information; generating a first modified head-related transfer function (HRTF) with the calculated ITD value and a second modified HRTF with the calculated ITD value; generating a first modified audio signal with the first modified HRTF and a second modified audio signal with the second modified HRTF; and transmitting the first modified audio signal and the second modified audio signal to one or more loudspeakers for output.
2. The computer-implemented method of clause 1, wherein generating the first modified HRTF with the calculated ITD value comprises changing a first time-of-arrival value of a first HRTF to a second time-of-arrival value and generating the second modified HRTF with the calculated ITD value comprises changing a third time-of-arrival value of a second HRTF to a fourth time-of-arrival value based on the calculated ITD value.
3. The computer-implemented method of clauses 1 or 2, wherein a difference between the second time-of-arrival value and the fourth time-of-arrival value equals the calculated ITD value.
4. The computer-implemented method of any of clauses 1-3, wherein generating the first modified HRTF with the calculated ITD value comprises changing a first time-of-arrival value of a first HRTF to a second time-of-arrival value and generating the second modified HRTF with the calculated ITD value comprises retaining a third time-of-arrival value of a second HRTF at a same value.
5. The computer-implemented method of any of clauses 1-4, wherein a difference between the second time-of-arrival value and the third time-of-arrival value equals the calculated ITD value.
6. The computer-implemented method of any of clauses 1-5, further comprising determining a head diameter for the user based on the head geometry information.
7. The computer-implemented method of any of clauses 1-6, wherein determining the calculated ITD value for the user based on the head geometry information comprises determining the calculated ITD value for the user based on the head diameter.
8. The computer-implemented method of any of clauses 1-7, wherein determining the head diameter for the user based on the head geometry information comprises determining a three-dimensional position of each ear of the user.
9. The computer-implemented method of any of clauses 1-8, wherein determining the head diameter for the user based on the head geometry information comprises determining an orientation of a head of the user.
10. The computer-implemented method of any of clauses 1-9, wherein determining the head diameter for the user based on the head geometry information comprises identifying one or more anthropomorphic landmarks on a head of the user.
11. The computer-implemented method of any of clauses 1-10, wherein receiving the head geometry information for the user comprises at least one of acquiring one or more images of the user or receiving accelerometer information associated with movement of a head of the user.
12. In some embodiments, one or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: receiving head geometry information for a user; determining a calculated interaural-time-delay (ITD) value for the user based on the head geometry information; generating a first modified head-related transfer function (HRTF) with the calculated ITD value and a second modified HRTF with the calculated ITD value; generating a first modified audio signal with the first modified HRTF and a second modified audio signal with the second modified HRTF; and transmitting the first modified audio signal and the second modified audio signal to one or more loudspeakers for output.
13. The one or more non-transitory computer-readable media of clause 12, wherein generating the first modified HRTF with the calculated ITD value comprises changing a first time-of-arrival value of a first HRTF to a second time-of-arrival value and generating the second modified HRTF with the calculated ITD value comprises changing a third time-of-arrival value of a second HRTF to a fourth time-of-arrival value based on the calculated ITD value.
14. The one or more non-transitory computer-readable media of clauses 12 or 13, wherein a difference between the second time-of-arrival value and the fourth time-of-arrival value equals the calculated ITD value.
15. The one or more non-transitory computer-readable media of any of clauses 12-14, wherein generating the first modified HRTF with the calculated ITD value comprises changing a first time-of-arrival value of a first HRTF to a second time-of-arrival value and generating the second modified HRTF with the calculated ITD value comprises retaining a third time-of-arrival value of a second HRTF at a same value.
16. The one or more non-transitory computer-readable media of any of clauses 12-15, wherein a difference between the second time-of-arrival value and the third time-of-arrival value equals the calculated ITD value.
17. The one or more non-transitory computer-readable media of any of clauses 12-16, further comprising determining a head diameter for the user based on the head geometry information.
18. The one or more non-transitory computer-readable media of any of clauses 12-17, wherein determining the calculated ITD value for the user based on the head geometry information comprises determining the calculated ITD value for the user based on the head diameter.
19. The one or more non-transitory computer-readable media of any of clauses 12-18, wherein receiving the head geometry information for the user comprises at least one of acquiring one or more images of the user or receiving accelerometer information associated with movement of a head of the user.
20. In some embodiments a system includes: one or more loud speakers; one or more head geometry sensors; a memory storing instructions; and one or more processors, that when executing the instructions, are configured to perform the steps of: receiving head geometry information for a user; determining a calculated interaural-time-delay (ITD) value for the user based on the head geometry information; generating a first modified head-related transfer function (HRTF) with the calculated ITD value and a second modified HRTF with the calculated ITD value; generating a first modified audio signal with the first modified HRTF and a second modified audio signal with the second modified HRTF; and transmitting the first modified audio signal and the second modified audio signal to the one or more loudspeakers for output.
21. The system of clause 20, wherein the one or more head geometry sensors comprise a camera.
22. The system of clause 20 or 21, wherein the one or more head geometry sensors comprise at least one of an accelerometer, an inertial measurement unit, a gyroscopic sensor, or a magnetometer.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
Aspects of the present embodiments may be embodied as a system, method, or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable processors or gate arrays.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
While the preceding is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims the benefit of U.S. Provisional patent application titled, “Reconstruction of Interaural Time Difference Using A Head Diameter Determined By A Camera-Based System,” filed on Jan. 3, 2024, and having Ser. No. 63/617, 139. The subject matter of this related application is hereby incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63617139 | Jan 2024 | US |