The present disclosure relates to binaural audio, and in particular, to adjustment of a pre-rendered binaural audio signal according to movement of a listener's head.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Binaural audio generally refers to audio that is recorded, or played back, in such a way that accounts for the natural ear spacing and head shadow of the ears and head of a listener. The listener thus perceives the sounds to originate in one or more spatial locations. Binaural audio may be recorded by using two microphones placed at the two ear locations of a dummy head. Binaural audio may be played back using headphones. Binaural audio may be rendered from audio that was recorded non-binaurally by using a head-related transfer function (HRTF) or a binaural room impulse response (BRIR). Binaural audio generally includes a left signal (to be output by the left headphone), and a right signal (to be output by the right headphone). Binaural audio differs from stereo in that stereo audio may involve loudspeaker crosstalk between the loudspeakers.
Head tracking (or headtracking) generally refers to tracking the orientation of a user's head to adjust the input to, or output of, a system. For audio, headtracking refers to changing an audio signal according to the head orientation of a listener.
Binaural audio and headtracking may be combined as follows. First, a sensor generates headtracking data that corresponds to the orientation of the listener's head. Second, the audio system uses the headtracking data to generate a binaural audio signal from channel-based or object-based audio. Third, the audio system sends the binaural audio signal to the listener's headphones for playback. The process then continues, with the headtracking data being used to generate the binaural audio signal.
In contrast to channel-based or object-based audio, pre-rendered binaural audio does not account for the orientation of the listener's head. Instead, pre-rendered binaural audio uses a default orientation according to the rendering. Thus, there is a need to apply headtracking to pre-rendered binaural audio.
According to an embodiment, a method modifies a binaural signal using headtracking information. The method includes receiving, by a headset, a binaural audio signal, where the binaural audio signal includes a first signal and a second signal. The method further includes generating, by a sensor, headtracking data, and where the headtracking data relates to an orientation of the headset. The method further includes calculating, by a processor, a delay based on the headtracking data, a first filter response based on the headtracking data, and a second filter response based on the headtracking data. The method further includes applying the delay to one of the first signal and the second signal, based on the headtracking data, to generate a delayed signal, where an other of the first signal and the second signal is an undelayed signal. The method further includes applying the first filter response to the delayed signal to generate a modified delayed signal. The method further includes applying the second filter response to the undelayed signal to generate a modified undelayed signal. The method further includes outputting, by a first speaker of the headset according to the headtracking data, the modified delayed signal. The method further includes outputting, by a second speaker of the headset according to the headtracking data, the modified undelayed signal.
The headtracking data may corresponds to an azimuthal orientation, where the azimuthal orientation is one of a leftward orientation and a rightward orientation.
When the first signal is a left signal and the second signal is a right signal, the delayed signal may correspond to the left signal, the undelayed signal may be the right signal, the first speaker may be a left speaker, and the second speaker may be a right speaker. Alternatively, the delayed signal may correspond to the right signal, the undelayed signal may be the left signal, the first speaker may be a right speaker, and the second speaker may be a left speaker.
The sensor and the processor may be components of the headset. The sensor may be one of an accelerometer, a gyroscope, a magnetometer, an infrared sensor, a camera, and a radio-frequency link.
The method may further include mixing the first signal and the second signal, based on the headtracking data, before applying the delay, before applying the first filter response, and before applying the second filter response.
When the headtracking data is current headtracking data that relates to a current orientation of the headset, the delay is a current delay, the first filter response is a current first filter response, the second filter response is a current second filter response, the delayed signal is a current delayed signal, and the undelayed signal is a current undelayed signal, the method may further include storing previous headtracking data, where the previous headtracking data corresponds to the current headtracking data at a previous time. The method may further include calculating, by the processor, a previous delay based on the previous headtracking data, a previous first filter response based on the previous headtracking data, and a previous second filter response based on the previous headtracking data. The method may further include applying the previous delay to one of the first signal and the second signal, based on the previous headtracking data, to generate a previous delayed signal, where an other of the first signal and the second signal is a previous undelayed signal. The method may further include applying the previous first filter response to the previous delayed signal to generate a modified previous delayed signal. The method may further include applying the previous second filter response to the previous undelayed signal to generate a modified previous undelayed signal. The method may further include cross-fading the modified delayed signal and the modified previous delayed signal, where the first speaker outputs the modified delayed signal and the modified previous delayed signal having been cross-faded. The method may further include cross-fading the modified undelayed signal and the modified previous undelayed signal, where the second speaker outputs the modified undelayed signal and the modified previous undelayed signal having been cross-faded.
The headtracking data may correspond to an elevational orientation, where the elevational orientation is one of an upward orientation and a downward orientation.
The headtracking data may correspond to an azimuthal orientation and an elevational orientation.
The method may further include calculating, by the processor, an elevation filter based on the headtracking data. The method may further include applying the elevation filter to the modified delayed signal prior to outputting the modified delayed signal. The method may further include applying the elevation filter to the modified undelayed signal prior to outputting the modified undelayed signal.
Calculating the elevation filter may include accessing a plurality of generalized pinna related impulse responses based on the headtracking data. Calculating the elevation filter may further include determining a ratio between a current elevational orientation of a first selected one of the plurality of generalized pinna related impulse responses and a forward elevational orientation of a second selected one of the plurality of generalized pinna related impulse responses.
According to an embodiment, an apparatus modifies a binaural signal using headtracking information. The apparatus includes a processor, a memory, a sensor, a first speaker, a second speaker, and a headset. The headset is adapted to position the first speaker nearby a first ear of a listener and to position the second speaker nearby a second ear of the listener. The processor is configured to control the apparatus to execute processing that includes receiving, by the headset, a binaural audio signal, where the binaural audio signal includes a first signal and a second signal. The processing further includes generating, by the sensor, headtracking data, where the headtracking data relates to an orientation of the headset. The processing further includes calculating, by the processor, a delay based on the headtracking data, a first filter response based on the headtracking data, and a second filter response based on the headtracking data. The processing further includes applying the delay to one of the first signal and the second signal, based on the headtracking data, to generate a delayed signal, where an other of the first signal and the second signal is an undelayed signal. The processing further includes applying the first filter response to the delayed signal to generate a modified delayed signal. The processing further includes applying the second filter response to the undelayed signal to generate a modified undelayed signal. The processing further includes outputting, by the first speaker of the headset according to the headtracking data, the modified delayed signal. The processing further includes outputting, by the second speaker of the headset according to the headtracking data, the modified undelayed signal. The processor may be further configured to perform one or more of the other method steps described above.
According to an embodiment, a non-transitory computer readable medium stores a computer program for controlling a device to modify a binaural signal using headtracking information. The device may include a processor, a memory, a sensor, a first speaker, a second speaker, and a headset. The computer program when executed by the processor may perform one or more of the method steps described above.
According to an embodiment, a method modifies a binaural signal using headtracking information. The method includes receiving, by a headset, a binaural audio signal. The method further includes upmixing the binaural audio signal into a four-channel binaural signal, where the four-channel binaural signal includes a front binaural signal and a rear binaural signal. The method further includes generating, by a sensor, headtracking data, where the headtracking data relates to an orientation of the headset. The method further includes applying the headtracking data to the front binaural signal to generate a modified front binaural signal. The method further includes applying an inverse of the headtracking data to the rear binaural signal to generate a modified rear binaural signal. The method further includes combining the modified front binaural signal and the modified rear binaural signal to generate a combined binaural signal. The method further includes outputting, by at least two speakers of the headset, the combined binaural signal.
According to an embodiment, a method modifies a parametric binaural signal using headtracking information. The method includes generating, by a sensor, headtracking data, where the headtracking data relates to an orientation of a headset. The method further includes receiving an encoded stereo signal, where the encoded stereo signal includes a stereo signal and presentation transformation information, and where the presentation transformation information relates the stereo signal to a binaural signal. The method further includes decoding the encoded stereo signal to generate the stereo signal and the presentation transformation information. The method further includes performing presentation transformation on the stereo signal using the presentation transformation information to generate the binaural signal and acoustic environment simulation input information. The method further includes performing acoustic environment simulation on the acoustic environment simulation input information to generate acoustic environment simulation output information. The method further includes combining the binaural signal and the acoustic environment simulation output information to generate a combined signal. The method further includes modifying the combined signal using the headtracking data to generate an output binaural signal. The method further includes outputting, by at least two speakers of the headset, the output binaural signal.
According to an embodiment, a method modifies a parametric binaural signal using headtracking information. The method includes generating, by a sensor, headtracking data, where the headtracking data relates to an orientation of a headset. The method further includes receiving an encoded stereo signal, where the encoded stereo signal includes a stereo signal and presentation transformation information, and where the presentation transformation information relates the stereo signal to a binaural signal. The method further includes decoding the encoded stereo signal to generate the stereo signal and the presentation transformation information. The method further includes performing presentation transformation on the stereo signal using the presentation transformation information to generate the binaural signal and acoustic environment simulation input information. The method further includes performing acoustic environment simulation on the acoustic environment simulation input information to generate acoustic environment simulation output information. The method further includes modifying the binaural signal using the headtracking data to generate an output binaural signal. The method further includes combining the output binaural signal and the acoustic environment simulation output information to generate a combined signal. The method further includes outputting, by at least two speakers of the headset, the combined signal.
According to an embodiment, a method modifies a parametric binaural signal using headtracking information. The method includes generating, by a sensor, headtracking data, where the headtracking data relates to an orientation of a headset. The method further includes receiving an encoded stereo signal, where the encoded stereo signal includes a stereo signal and presentation transformation information, and where the presentation transformation information relates the stereo signal to a binaural signal. The method further includes decoding the encoded stereo signal to generate the stereo signal and the presentation transformation information. The method further includes performing presentation transformation on the stereo signal using the presentation transformation information and the headtracking data to generate a headtracked binaural signal, where the headtracked binaural signal corresponds to the binaural signal having been matrixed. The method further includes performing presentation transformation on the stereo signal using the presentation transformation information to generate acoustic environment simulation input information. The method further includes performing acoustic environment simulation on the acoustic environment simulation input information to generate acoustic environment simulation output information. The method further includes combining the headtracked binaural signal and the acoustic environment simulation output information to generate a combined signal. The method further includes outputting, by at least two speakers of the headset, the combined signal.
According to an embodiment, a method modifies a parametric binaural signal using headtracking information. The method includes generating, by a sensor, headtracking data, where the headtracking data relates to an orientation of a headset. The method further includes receiving an encoded stereo signal, where the encoded stereo signal includes a stereo signal and presentation transformation information, where the presentation transformation information relates the stereo signal to a binaural signal. The method further includes decoding the encoded stereo signal to generate the stereo signal and the presentation transformation information. The method further includes performing presentation transformation on the stereo signal using the presentation transformation information to generate the binaural signal. The method further includes modifying the binaural signal using the headtracking data to generate an output binaural signal. The method further includes outputting, by at least two speakers of the headset, the output binaural signal.
According to an embodiment, an apparatus modifies a parametric binaural signal using headtracking information. The apparatus includes a processor, a memory, a sensor, at least two speakers, and a headset. The headset is adapted to position the at least two speakers nearby ears of a listener. The processor is configured to control the apparatus to execute processing that includes generating, by the sensor, headtracking data, wherein the headtracking data relates to an orientation of the headset. The processing further includes receiving an encoded stereo signal, where the encoded stereo signal includes a stereo signal and presentation transformation information, and where the presentation transformation information relates the stereo signal to a binaural signal. The processing further includes decoding the encoded stereo signal to generate the stereo signal and the presentation transformation information. The processing further includes performing presentation transformation on the stereo signal using the presentation transformation information to generate the binaural signal. The processing further includes modifying the binaural signal using the headtracking data to generate an output binaural signal. The processing further includes outputting, by the at least two speakers of the headset, the output binaural signal. The processor may be further configured to perform one or more of the other method steps described above.
The following detailed description and accompanying drawings provide a further understanding of the nature and advantages of various implementations.
Described herein are techniques for using headtracking with pre-rendered binaural audio. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be evident, however, to one skilled in the art that the present disclosure as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.
In the following description, various methods, processes and procedures are detailed. Although particular steps may be described in gerund form, such wording also indicates the state of being in that form. For example, “storing data in a memory” may indicate at least the following: that the data currently becomes stored in the memory (e.g., the memory did not previously store the data); that the data currently exists in the memory (e.g., the data was previously stored in the memory); etc. Such a situation will be specifically pointed out when not clear from the context. Although particular steps may be described in a certain order, such order is mainly for convenience and clarity. A particular step may be repeated more than once, may occur before or after other steps (even if those steps are otherwise described in another order), and may occur in parallel with other steps. A second step is required to follow a first step only when the first step must be completed before the second step is begun. Such a situation will be specifically pointed out when not clear from the context.
In this document, the terms “and”, “or” and “and/or” are used. Such terms are to be read as having an inclusive meaning. For example, “A and B” may mean at least the following: “both A and B”, “at least both A and B”. As another example, “A or B” may mean at least the following: “at least A”, “at least B”, “both A and B”, “at least both A and B”. As another example, “A and/or B” may mean at least the following: “A and B”, “A or B”. When an exclusive-or is intended, such will be specifically noted (e.g., “either A or B”, “at most one of A and B”).
This document uses the terms “audio”, “audio signal” and “audio data”. In general, these terms are used interchangeably. When specificity is desired, the term “audio” is used to refer to the input captured by a microphone, or the output generated by a loudspeaker. The term “audio data” is used to refer to data that represents audio, e.g. as processed by an analog to digital converter (ADC), as stored in a memory, or as communicated via a data signal. The term “audio signal” is used to refer to audio transmitted in analog or digital electronic form.
This document uses the terms “headphones” and “headset”. In general, these terms are used interchangeably. When specificity is desired, the term “headphones” is used to refer to the speakers, and the term “headset” is used to refer to both the speakers and the additional components such as the headband, housing, etc. The term “headset” may also be used to refer to a device with a display or screen such as a head-mounted display.
Without Headtracking
The pre-rendered binaural audio signal includes a left signal that is provided to the left speaker of the headphones 104, and a right signal that is provided to the right speaker of the headphones 104. By changing the parameters of the left signal and the right signal, the listener's perception of the location of the sound may be changed. For example, the sound may be perceived to be to the left of the listener 102, to the right, behind, closer, further away, etc. The sound may also be perceived to be positioned in three-dimensional space, e.g., above or below the listener 102, in addition to its perceived position in the horizontal plane.
Similarly to
Headtracking
Head tracking may be used to perform real-time binaural audio processing in response to a listener's head movements. Using a one or more sensors, such as accelerometers, gyroscopes, and magnetometers along with a sensor-fusion algorithm, a binaural processing algorithm can be driven with stable yaw, pitch, and roll values representing the current rotation of a listener's head. Typical binaural processing uses head-related transfer functions (HRTFs), which are a function of azimuth and elevation. By inverting the current head rotation parameters, head-tracked binaural processing can give the perception of a physically consistent sound source with respect to a listener's head rotation.
In the use case where binaural audio is pre-rendered, it is typically too late to apply headtracking. The pre-rendered binaural is usually rendered for the head facing directly “forward”, as shown in
The present disclosure describes a system and method to adjust the pre-rendered binaural signal so that headtracking is still possible. The process is derived from a model of the head that allows for an adjustment of the pre-rendered binaural cues so that headtracking is facilitated.
Normally when headtracking is used for binaural rendering, the headphones are able to track the head rotation and the incoming audio is rendered on the fly, and is constantly adjusted based on the head rotation. In the case of pre-rendered binaural, we can still track the head motion, and use concepts from the Duplex Theory of Localization to adjust for the head motion. These concepts include interaural time delay (ITD) and interaural level difference (ILD).
An example is as follows. Assume the sound is to be perceived directly in front, as in
Further sections describe a system and method of applying headtracking to a pre-rendered binaural audio signal.
The binaural audio signal 410 may be received via a wired connection. Alternatively, the binaural audio signal 410 may be received wirelessly (e.g., via an IEEE 802.15.1 standard signal such as a Bluetooth™ signal, an IEEE 802.11 standard signal such as a Wi-Fi™ signal, etc.).
Alternatively, the electronics 500 may be located in another location, such as in another device (e.g., a computer, not shown), or on another part of the headset 400, such as in the right speaker 404, on the headband 406, etc.
The processor 502 generally controls the operation of the electronics 500. The processor 502 also applies headtracking to a pre-rendered binaural audio signal, as further detailed below. The processor 502 may execute one or more computer programs as part of its operation.
The memory 504 generally stores data operated on by the electronics 500. For example, the memory 504 may store one or more computer programs executed by the processor 502. The memory may store the pre-rendered binaural audio signal as it is received by the electronics 500 (e.g., as data samples), the left signal and right signal to be sent to the left and right speakers (see 402 and 404 in
The input interface 506 generally receives an audio signal (e.g., the left and right components L and R of the pre-rendered binaural audio signal). The output interface 508 generally outputs the left and right audio signals L′ and R′ to the left and right speakers (e.g., 402 and 404 in
The sensor 512 generally generates headtracking data 620. The headtracking data 620 relates to an orientation of the sensor 512 (or more generally, to the orientation of the electronics 500 or the headset 400 of
Alternatively, the sensor 512 may be a component of a device other than the electronics 500 or the headset 400 of
In general, the calculation block 602 generates a delay and filter parameters based on the headtracking data 620, provides the delay to the delay blocks 604 and 606, and provides the filter parameters to the filter blocks 608 and 610. The filter coefficients may be calculated according to the Brown-Duda model, and the delay values may be calculated according to the Woodsworth approximation. The delay and the filter parameters may be calculated as follows.
The delay D corresponds to the ITD as discussed above. The delay D may be calculated using Equation 1:
D=(r/c)·(arcsin(cos φ·sin θ)+cos φ·sin θ) (1)
In Equation 1, 0 is the azimuth angle (e.g., in a horizontal plane, the head turned left or right, as shown in
For φ=0 (e.g., the horizontal plane), Equation 1 may be simplified to Equation 2:
D=(r/c)·(θ+sin θ) 0≤θ≤π/2 (2)
The filter models may be derived as follows. In the continuous domain, the filter takes the form of Equations 3-5:
The bilinear transform may be used to convert to the discrete domain, as shown in Equation 6:
Now, redefine β from Equation 5 as in Equation 7:
In Equations 6-7, fs is the sample rate of the pre-rendered binaural audio signal. For example, 44.1 kHz is a common sample rate for digital audio signals.
Equation 8 then follows:
For two ears (the “near” ear, turned toward the perceived sound location, and the “far” ear, turned away from the perceived sound location), Equations 9-10 result:
In Equations 9-10, Hipsi is the transfer function of the filter for the “near” ear (referred to as the ipsilateral filter), Hcontra is the transfer function for the filter for the “far” ear (referred to as the contralateral filter), the subscript i is associated with the ipsilateral components, and the subscript c is associated with the contralateral components.
The components of Equations 9-10 are as given in Equations 11-18:
ao=ai0=aco=β+2 (11)
a1=ai1=ac1=β−2 (12)
bi0=β+2αi(θ) (13)
bi1=β−2αi(θ) (14)
bc0=β+2αc(θ) (15)
bc1=β−2αc(θ) (16)
αi(θ)=1+cos(θ−90°)=1+sin(θ) (17)
αc(θ)=1+cos(θ+90°)=1−sin(θ) (18)
Based on the head angle, the delay and filters are applied to the system 600 of
In
In
The system 900 operates on blocks of samples of the left input signal 622 and the right input signal 624. The delay and channel filters are then applied on a per block basis. A block size of 256 samples may be used in an embodiment. The size of the block may be adjusted as desired.
The head angle processor (preprocessor) 902 generally performs processing of the headtracking data 620 from the headtracking sensor (e.g., 512 in
The head angle θ ranges between −180 and +180 degrees, and the virtual head angle ranges between 0 and 90 degrees, so the head angle processor 902 may calculate the virtual head angle θ as follows. If the absolute value of the head angle is less than or equal to 90 degrees, then the virtual head angle is the absolute value of the head angle; else the virtual head angle is 180 minus the absolute value of the head angle.
The decision to designate the left or right channels as ipsilateral and contralateral is a function of the head angle θ. If the head angle is equal to or greater than zero (e.g., a leftward orientation), the left input is the contralateral input, and the right input is the ipsilateral input. If the head angle is less than zero (e.g., a rightward orientation), the left input is the ipsilateral input, and the right input is the contralateral input.
The delay is applied relatively between the left and right binaural channels. The contralateral channel is always delayed relative to the ipsilateral channel. Therefore if the head angle is greater than zero (e.g., looking left), the left channel is delayed relative to the right. If the head angle is less than zero (e.g., looking right), the right channel is delayed relative to the left. If the head angle is zero, no ITD correction is performed. In some embodiments, both channels may be delayed, with the amount of relative delay dependent on the headtracking data. In these embodiments, the labels “delayed” and “undelayed” may be interpreted as “more delayed” and “less delayed”.
The current orientation processor 910 generally calculates the delay (Equation 2) and the filter responses (Equations 9-10) for the current head orientation, based on the headtracking data 620 as processed by the head angle processor 902. The current orientation processor 910 includes a memory 911, a processor 912, channel mixers 913a and 913b, delays 914a and 914b, and filters 915a and 915b. The memory 911 stores the current head orientation. The processor 912 calculates the parameters for the channel mixers 913a and 913b, the delays 914a and 914b, and the filters 915a and 915b.
The channel mixers 913a and 913b selectively mix part of the left input signal 622 with the right input signal 624 and vice versa, based on the head angle θ. This mixing process handles channel inversion for the cases of θ>90 and θ<90, which allows the system to calculate the equations to work smoothly across a full 360 degrees of head angles. The channel mixers 913a and 913b implement a dynamic matrix mixer, where the coefficients are a function of θ. The 2×2 mixing matrix coefficients M are defined in TABLE 1:
The delays 914a and 914b generally apply the delay (see Equation 2) calculated by the processor 912. For example, when the headtracking data 620 indicates a leftward orientation (e.g., as in
The filters 915a and 915b generally apply the filters (see Equations 9-10) calculated by the processor 912. For example, when the headtracking data 620 indicates a leftward orientation (e.g., as in
The previous orientation processor 920 generally calculates the delay (Equation 2) and the filter responses (Equations 9-10) for the previous head orientation, based on the headtracking data 620 as processed by the head angle processor 902. The previous orientation processor 920 includes a memory 921, a processor 922, channel mixers 923a and 923b, delays 924a and 924b, and filters 925a and 925b. The memory 921 stores the previous head orientation. The remainder of the components operate in a similar manner to the similar components of the current orientation processor 910, but operate on the previous head angle (instead of the current head angle).
The delay 930 delays by the block size (e.g., 256 samples), then stores the current head orientation (from the memory 911) in the memory 921 as the previous head orientation. As discussed above, the system 900 operates on blocks of samples of the pre-rendered binaural audio signal. When the head angle θ changes, the system 900 computes the equations twice: once for the previous head angle by the previous orientation processor 920, and once for the current head angle by the current orientation processor 910. The current orientation processor 910 outputs a current left intermediate output 952a and a current right intermediate output 954a. The previous orientation processor 920 outputs a previous left intermediate output 952b and a previous right intermediate output 954b.
The left cross-fade 942 and right cross-fade 944 generally perform cross-fading on the intermediate outputs from the current orientation processor 910 and the previous orientation processor 920. The left cross-fade 942 performs cross-fading of the current left intermediate output 952a and the previous left intermediate output 952b to generate the output left signal 632. The right cross-fade 944 performs cross-fading of the current right intermediate output 954a and the previous right intermediate output 954b to generate the output right signal 634. The left cross-fade 942 and right cross-fade 944 may be implemented with linear cross-faders.
In general, the left cross-fade 942 and right cross-fade 944 enable the system 900 to avoid clicks in the audio when the head angle changes. In alternative embodiments, the left cross-fade 942 and right cross-fade 944 may be replaced with circuits to limit the slew rate of the changes in the delay and filter coefficients.
At 1102, a binaural audio signal is received. The binaural audio signal includes a first signal and a second signal. A headset may receive the binaural audio signal. For example, the headset 400 (see
At 1104, headtracking data is generated. A sensor may generate the headtracking data. The headtracking data relates to an orientation of the headset. For example, the sensor 512 (see
At 1106, a delay is calculated based on the headtracking data, a first filter response is calculated based on the headtracking data, and a second filter response is calculated based on the headtracking data. A processor may calculate the delay, the first filter response, and the second filter response. For example, the processor 502 (see
At 1108, the delay is applied to one of the first signal and the second signal, based on the headtracking data, to generate a delayed signal. The other of the first signal and the second signal is an undelayed signal. For example, in
At 1110, the first filter response is applied to the delayed signal to generate a modified delayed signal. For example, in
At 1112, the second filter response is applied to the undelayed signal to generate a modified undelayed signal. For example, in
At 1114, the modified delayed signal is output by a first speaker of the headset according to the headtracking data. For example, when the input left signal 622 is delayed (see
At 1116, the modified undelayed signal is output by a second speaker of the headset according to the headtracking data. For example, when the input right signal 624 is undelayed (see
For ease of description, the examples for steps 1102-1116 have been described with reference to the system 600 of
In steps 1118-1130 (see
At 1118, previous headtracking data is stored. The previous headtracking data corresponds to the current headtracking data at a previous time. For example, the memory 921 (see
As 1120, a previous delay is calculated based on the previous headtracking data, a previous first filter response is calculated based on the previous headtracking data, and a previous second filter response is calculated based on the previous headtracking data. For example, the previous orientation processor 920 (see
At 1122, the previous delay is applied to one of the first signal and the second signal, based on the previous headtracking data, to generate a previous delayed signal. The other of the first signal and the second signal is a previous undelayed signal. For example, the previous orientation processor 920 (see
At 1124, the previous first filter response is applied to the previous delayed signal to generate a modified previous delayed signal. For example, the previous orientation processor 920 (see
At 1126, the previous second filter response is applied to the previous undelayed signal to generate a modified previous undelayed signal. For example, the previous orientation processor 920 (see
At 1128, the modified delayed signal and the modified previous delayed signal are cross-faded. The first speaker outputs the modified delayed signal and the modified previous delayed signal having been cross-faded (instead of outputting just the modified delayed signal, as in 1114). For example, when the input left signal 622 is delayed, the left cross-fade 942 (see
At 1130, the modified undelayed signal and the modified previous undelayed signal are cross-faded. The second speaker outputs the modified undelayed signal and the modified previous undelayed signal having been cross-faded (instead of outputting just the modified undelayed signal, as in 1114). For example, when the input left signal 622 is not delayed, the left cross-fade 942 (see
The method 1100 may include additional steps or substeps, e.g. to implement other of the features discussed above regarding
The pinna (outer ear) is responsible for directional cues relating to elevation. To simulate the effects of elevation, the filters 1216a, 1216b, 1226a and 1226b incorporate the ratio of an average pinna response when looking directly ahead to the response when the head is elevationally tilted. The filters 1216a, 1216b, 1226a and 1226b implement filter responses that change dynamically based on the elevation angle relative to the listener's head. If the listener is looking straight ahead, the ratio is 1:1 and no filtering is going on. This gives the benefit of no coloration of the sound when the head is pointed in the default direction (straight ahead). As the listener's head moves away from straight ahead, a larger change in the ratio occurs.
The processors 1212 and 1222 calculate the parameters for the filters 1216a, 1216b, 1226a and 1226b, similarly to the processors 912 and 922 of
To simulate the effects of headtracking for elevation, the filters 1216a, 1216b, 1226a and 1226b are used to mimic the difference between looking forward (or straight ahead) and looking up or down. These are derived by first doing a weighted average over multiple subjects, with anthropometric outliers removed, to obtain a generalized pinna related impulse response (PRIR) for a variety of directions. For example, generalized PRIRs may be obtained for straight ahead (e.g., 0 degrees elevation), looking upward at 45 degrees (e.g., −45 degrees elevation), and looking directly downward (e.g., +90 degrees elevation). According to various embodiments, the generalized PRIRs may be obtained for each degree (e.g., 135 PRIRs from +90 to −45 degrees), or for every five degrees (e.g., 28 PRIRs from +90 to −45 degrees), or for every ten degrees (e.g., 14 PRIRs from +90 to −45 degrees), etc. These generalized PRIRs may be stored in a memory of the system 1200 (e.g., in the memory 504 as implemented by the electronics 500). The system 1200 may interpolate between the stored generalized PRIRs, as desired, to accommodate elevations other than those of the stored generalized PRIRs. (As the just-noticeable distance (JND) for localization is about one degree, interpolation to resolutions finer than one degree may be avoided.)
Let P(θ, φ, f) be the generalized pinna related transfer function in the frequency domain, where θ is the azimuth angle and φ is the elevation angle. The ratio of the forward PRIR to the PRIR of the current orientation of the listener is given by Equation 19:
Pr(θ,φ,f)=P(θ,φ,f)/P(θ,0,f) (19)
In Equation 19, Pr(θ, φ, f) represents the ratio of the two PRIRs at any given frequency f, and 0 degrees is the elevation angle when looking forward or straight ahead.
These ratios are computed for any given “look” angle and applied to both left and right channels as the listener moves her head up and down. If the listener is looking straight ahead, the ratio is 1:1 and no net filtering is going on. This gives the benefit of no coloration of the sound when the head is pointed in the default direction (forward or straight ahead). As the listener's head moves away from straight ahead, a larger change in the ratio occurs. The net effect is that the default direction pinna cue is removed and the “look” angle pinna cue is inserted.
The system 1200 may implement a method similar to the method 1100 (see
Four-Channel Audio
Headtracking may also be used with four-channel audio, as further detailed below with reference to
The upmixer 1310 generally receives the input binaural signal 1350 and upmixes it to generate a 4-channel binaural signal that includes a front binaural signal 1312 (that includes left and right channels) and a rear binaural signal 1314 (that includes left and right channels). In general, the front binaural signal 1312 includes the direct components (e.g., not including reverb components), and the rear binaural signal 1314 includes the diffuse components (e.g., the reverb components). The upmixer 1310 may generate the front binaural signal 1312 and the rear binaural signal 1314 in various ways, including using metadata and using a signal model.
Regarding the metadata, the input binaural signal 1350 may be a pre-rendered signal (e.g., similar to the binaural audio signal 410 of
Regarding the signal model, the upmixer 1310 may generate the 4-channel binaural signal using a signal model that allows for a single steered (e.g., direct) signal between the inputs LT and RT with a diffuse signal in each input signal. The signal model is represented by Equations 20-25 for input LT and RT respectively. For simplicity, the time, frequency and complex signal notations have been omitted.
LT=GLs+dL (20)
RT=GRs+dR (21)
From Equation 20, LT is constructed from a gain GL multiplied by the steered signal s plus a diffuse signal dL. RT is similarly constructed as shown in Equation 21. It is further assumed that the power of the steered signal is S2 as shown in Equation 22. The cross-correlation between s, dL, and dR are all zero as shown in Equation 23, and power in the left diffuse signal (dL) is equal to the power in the right diffuse signal (dR), which are equal to D2 as shown in Equation 24. With these assumptions, the covariance matrix between the input signals LT and RT is given by Equation 25.
In order to separate out the steered signals from LT and RT, a 2×2 signal dependent separation matrix is calculated using the least squares method as shown in Equation 26. The solution to the least squares equation is given by Equation 27. The separated steered signal s (e.g., the front binaural signal 1312) is therefore estimated by Equation 28. The diffuse signals dL, and dR may then be calculated according to Equations 20-21 to give the combined diffuse signal d (e.g., the rear binaural signal 1314).
The derivation of the signal dependent separation matrix W for time block m in processing band b with respect to signal statistic estimations X, Y and T is given by Equation 29.
The 3 measured signal statistics (X, Y and T) with respect to the assumed signal model are given by Equations 30 through 32. The result of substituting equations 30, 31 32 into Equation 29 is an estimate of the least squares solution given by Equation 33.
The front headtracking system 1320 generally receives the front binaural signal 1312 and generates a modified front binaural signal 1322 using the headtracking data 620. The front headtracking system 1320 may be implemented by the system 900 (see
The rear headtracking system 1330 generally receives the rear binaural signal 1314 and generates a modified rear binaural signal 1324 using an inverse of the headtracking data 620. The details of the rear headtracking system 1330 are shown in
The remixer 1340 generally combines the modified front binaural signal 1322 and the modified rear binaural signal 1324 to generate the output binaural signal 1360. For example, the output binaural signal 1360 includes left and right channels, where the left channels is a combination of the respective left channels of the modified front binaural signal 1322 and the modified rear binaural signal 1324, and the right channel is a combination of the respective right channels thereof. The output binaural signal 1360 may then be output by speakers (e.g., by the headset 400 of
At 1602, a binaural audio signal is received. A headset may receive the binaural audio signal. For example, the headset 400 (see
At 1604, the binaural audio signal is upmixed into a four-channel binaural signal. The four-channel binaural signal includes a front binaural signal and a rear binaural signal. For example, the upmixer 1310 (see
At 1606, headtracking data is generated. The headtracking data relates to an orientation of the headset. A sensor may generate the headtracking data. For example, the sensor 512 (see
At 1608, the headtracking data is applied to the front binaural signal to generate a modified front binaural signal. For example, the front headtracking system 1320 (see
At 1610, an inverse of the headtracking data is applied to the rear binaural signal to generate a modified rear binaural signal. For example, the rear headtracking system 1330 (see
At 1612, the modified front binaural signal and the modified rear binaural signal are combined to generate a combined binaural signal. For example, the remixer 1340 (see FIG. 13) may combine the modified front binaural signal 1322 and the modified rear binaural signal 1324 to generate the output binaural signal 1360.
At 1614, the combined binaural signal is output. For example, speakers 402 and 404 (see
The method 1600 may include further steps or substeps, e.g. to implement other of the features discussed above regarding
Parametric Binaural
Headtracking may also be used when decoding binaural audio using a parametric binaural presentation, as further detailed below with reference to
The encoder 1710 generally transforms audio content 1712 using head-related transfer functions (HRTFs) 1714 to generate an encoded signal 1716. The audio content 1712 may be channel based or object based. The encoder 1710 includes an analysis block 1720, a speaker renderer 1722, an anechoic binaural renderer 1724, an acoustic environment simulation input matrix 1726, a presentation transformation parameter estimation block 1728, and an encoder block 1730.
The analysis block 1720 generates an analyzed signal 1732 by performing time-to-frequency analysis on the audio content 1712. The analysis block 1720 may also perform framing. The analysis block 1720 may implement a hybrid complex quadrature mirror filter (HCQMF).
The speaker renderer 1722 generates a loudspeaker signal 1734 (LoRo, where “L” and “R” indicate left and right components) from the analyzed signal 1732. The speaker renderer 1722 may perform matrixing or convolution.
The anechoic binaural renderer 1724 generates an anechoic binaural signal 1736 (LaRa) from the analyzed signal 1732 using the HRTFs 1714. In general, the anechoic binaural renderer 1724 convolves the input channels or objects of the analyzed signal 1732 with the HRTFs 1714 in order to simulate the acoustical pathway from an object position to both ears. The HRTFs may vary as a function of time if object-based audio is provided as input, based on positional metadata associated with one or more object-based audio inputs.
The acoustic environment simulation input matrix 1726 generates acoustic environment simulation input information 1738 (ASin) from the analyzed signal 1732. The acoustic environment simulation input information 1738 generates a signal intended as input for an artificial acoustical environment simulation algorithm.
The presentation transformation parameter estimation block 1728 generates presentation transformation parameters 1740 (W) that relate the anechoic binaural signal LaRa 1736 and the acoustic environment simulation input information ASin 1738 to the loudspeaker signal LoRo 1734. The presentation transformation parameters 1740 may also be referred to as presentation transformation information or parameters.
The encoder block 1730 generates the encoded signal 1716 using the loudspeaker signal LoRo 1734 and the presentation transformation parameters W 1740.
The decoder 1750 generally decodes the encoded signal 1716 into a decoded signal 1756. The decoder 1750 includes a decoder block 1760, a presentation transformation block 1762, an acoustic environment simulator 1764, and a mixer 1766.
The decoder block 1760 decodes the encoded signal 1716 to generate the presentation transformation parameters W 1740 and the loudspeaker signal LoRo 1734. The presentation transformation block 1762 transforms the loudspeaker signal LoRo 1734 using the presentation transformation parameters W 1740, in order to generate the anechoic binaural signal LaRa 1736 and the acoustic environment simulation input information ASin 1738. The presentation transformation process may include matrixing operations, convolution operations, or both. The acoustic environment simulator 1764 performs acoustic environment simulation using the acoustic environment simulation input information ASin 1738 to generate acoustic environment simulation output information ASout 1768 that models the artificial acoustical environment. There are many existing algorithms and methods to simulate an acoustical environment, which include convolution with a room impulse response, or algorithmic synthetic reverberation algorithms such as feedback-delay networks (FDNs). The mixer 1766 mixes the anechoic binaural signal LaRa 1736 and the acoustic environment simulation output information ASout 1768 to generate the decoded signal 1756.
The synthesis block 1780 performs frequency-to-time synthesis (e.g., HCQMF synthesis) on the decoded signal 1756 to generate a binaural signal 1782. The headset 1790 includes left and right speakers that output respective left and right components of the binaural signal 1782.
As discussed above, the system 1700 operates in a transform (frequency) or filterbank domain, using (for example) HCQMF, discrete Fourier transform (DFT), modified discrete cosine transform (MDCT), etc.
In this manner, the decoder 1750 generates the anechoic binaural signal (LaRa 1736) by means of the presentation transformation block 1762 and mixes it with a “rendered at the time of listening” acoustic environment simulation output signal (ASout 1768). This mix (the decoded signal 1756) is then presented to the listener via the headphones 1790.
Headtracking may be added to the decoder 1750 according to various options, as described with reference to
The presentation transformation block 1810 receives the loudspeaker signal LoRo 1734 and the presentation transformation parameters W 1740, and generates the left anechoic signal La 1842, the right anechoic signal Ra 1844, and the acoustic environment simulation input information ASin 1738. The presentation transformation block 1810 may implement signal matrixing and convolution in a manner similar to the presentation transformation block 1762 (see
The headtracking processor 1820 processes the left anechoic signal La 1842 and the right anechoic signal Ra 1844 using the headtracking data 620 to generate the headtracked left anechoic signal LaTr 1852 and the headtracked right anechoic signal RaTr 1854.
The acoustic environment simulator 1830 processes the acoustic environment simulation input information ASin 1738 using the headtracking data 620 to generate the headtracked acoustic environment simulation output information ASoutTr 1856.
The mixer 1840 mixes the headtracked left anechoic signal LaTr 1852, the headtracked right anechoic signal RaTr 1854, and the headtracked acoustic environment simulation output information ASoutTr 1856 to generate the headtracked left binaural signal LbTr 1862 and the headtracked right binaural signal RbTr 1864.
The headset 400 (see
The headtracking processor 1920 processes the acoustic environment simulation output information ASout 1768 using the headtracking data 620 to generate the headtracked acoustic environment simulation output information ASoutTr 1856.
As compared to
The presentation transformation block 1810, acoustic environment simulator 1764, and headset 400 operate as described above regarding
The mixer 2040 mixes the left anechoic signal La 1842, the right anechoic signal Ra 1844, and the acoustic environment simulation output information ASout 1768 to generate a left binaural signal 2042 (Lb) and a right binaural signal 2044 (Rb).
The headtracking processor 2050 applies the headtracking data 620 to the left binaural signal Lb 2042 and the right binaural signal Rb 2044 to generate the headtracked left binaural signal LbTr 1862 and the headtracked right binaural signal RbTr 1864.
As compared to
In general, the calculation block 2110 generates a delay and filter parameters based on the headtracking data 620, provides a left delay D(L) 2111 to the left delay block 2122, provides a right delay D(R) 2112 to the right delay block 2132, provides the left filter parameters H(L) 2113 to the left filter block 2124, and provides the right filter parameters H(R) 2114 to the right filter block 2134.
As discussed above regarding
In a frequency-domain representation, a delay may be approximated by a phase shift for each frequency band, and a filter may be approximated by a scalar in each frequency band. The calculation block 2210 and the matrixing block 2220 then implement these approximations. Specifically, the calculation block 2210 generates an input matrix 2212 for each frequency band. The input matrix MHead 2212 may be a 2×2, complex-valued input-output matrix. The matrixing block 2220 applies the input matrix 2212, for each frequency band, to the input left signal L 2140 and the input right signal R 2150 (after processing by the respective left analysis block 2120 and right analysis block 2130), to generate the inputs to the respective left synthesis block 2126 and right synthesis block 2136. The magnitude and phase parameters of the matrix may be obtained by sampling the phase and magnitude of the delay and filter operations given in
More specifically, if the delays D(L) 2111 and D(R) 2112 (see
with
m11(f)=exp(−2πjfD(L))H(L,z=exp(2πjf)) (35)
m22(f)=exp(−2πjfD(L))H(R,z=exp(2πjf)) (36)
If the headtracking data changes over time, the calculation block 2210 may re-calculate a new matrix for each frequency band, and subsequently change the matrix (implemented by the matrixing block 2220) to the newly obtained matrix in each band. For improved quality, the calculation block 2210 may use interpolation when generating the input matrix 2212 for the new matrix, to ensure a smooth transition from one set of matrix coefficients to the next. The calculation block 2210 may apply the interpolation to the real and imaginary parts of the matrix independently, or may operate on the magnitude and phase of the matrix coefficients.
The system 2200 does not necessarily include channel mixing, since there are no cross terms between the left and right signals (see also the system 2100 of
Regarding the components mentioned before: Briefly, the decoder block 1760 generates a frequency-domain representation of the loudspeaker presentation (the loudspeaker signal LoRo 1734) and parameter data (the presentation transformation parameters W 1740). The matrixing block 1762 uses the presentation transformation parameters W 1740 to transform the loudspeaker signal LoRo 1734 into an anechoic binaural presentation (the anechoic binaural signal LaRa 1736) and the acoustic environment simulation input information ASin 1738 by means of a matrixing operation per frequency band. The acoustic environment simulator 1764 performs acoustic environment simulation using the acoustic environment simulation input information ASin 1738 to generate the acoustic environment simulation output information ASout 1768. The mixer 1766 mixes the anechoic binaural signal LaRa 1736 and the acoustic environment simulation output information ASout 1768 to generate the decoded signal 1756. The mixer 1766 may be similar to the mixer 2040 (see
The preprocessor 2302 generally performs processing of the headtracking data 620 from the headtracking sensor (e.g., 512 in
The calculation block 2304 generally operates on the preprocessed headtracking data from the preprocessor 2302 to generate the input matrix for the matrixing block 2306. The calculation block 2304 may be similar to the calculation block 2210 (see
The matrixing block 2306 generally applies the input matrix from the calculation block 2304 to each frequency band of the decoded signal 1756 to generate the input to the synthesis block 2308. The matrixing block 2306 may be similar to the matrixing block 2220 (see
The synthesis block 2308 generally performs frequency-to-time synthesis (e.g., HCQMF synthesis) on the decoded signal 1756 to generate a binaural signal 2320. The synthesis block 2308 may be implemented as two synthesis blocks, similar to the left synthesis block 2126 and the right synthesis block 2136 (see
Regarding the components mentioned before: Briefly, the decoder block 1760 generates a frequency-domain representation of the loudspeaker presentation (the loudspeaker signal LoRo 1734) and presentation transformation parameter data (the presentation transformation parameters W 1740). The presentation transformation block 1762 uses the presentation transformation parameters W 1740 to transform the loudspeaker signal LoRo 1734 into an anechoic binaural presentation (the anechoic binaural signal LaRa 1736) and the acoustic environment simulation input information ASin 1738 by means of a matrixing operation per frequency band.
The preprocessor 2402 generally performs processing of the headtracking data 620 from the headtracking sensor (e.g., 512 in
The calculation block 2404 generally operates on the preprocessed headtracking data 2420 from the preprocessor 2302 to generate the input matrix for the matrixing block 2406. The calculation block 2404 may be similar to the calculation block 2210 (see
The matrixing block 2406 generally applies the input matrix from the calculation block 2404 to each frequency band of the anechoic binaural signal LaRa 1736 to generate a headtracked anechoic binaural signal 2416 for the mixer 2410. (Compare the matrixing block 2406 to the headtracking processor 1820 (see
The acoustic environment simulator 2408 generally performs acoustic environment simulation using the acoustic environment simulation input information ASin 1738 to generate the acoustic environment simulation output information ASout 1768. The acoustic environment simulator 2408 may be similar to the acoustic environment simulator 1764 (see
The mixer 2410 generally mixes the acoustic environment simulation output information ASout 1768 and the headtracked anechoic binaural signal 2416 to generate a combined headtracked signal to the synthesis block 2308. The mixer 2410 may be similar to the mixer 1766 (see
The synthesis block 2308 operates in a manner similar to that discussed above regarding
The presentation transformation block 2562 combines the operations of the presentation transformation block 1762 and the matrixing block 2406 (see
Mcombined=MheadMtrans (38)
The headtracking matrix Mhead will be equal to a unity matrix if no headtracking is supported, or when no positional changes of the head with respect to a reference position or orientation are detected. In the above example, the acoustic environment simulation input signal is not taken into account.
The synthesis block 2308 operates in a manner similar to that discussed above regarding
At 2602, headtracking data is generated. The headtracking data relates to an orientation of a headset. A sensor may generate the headtracking data. For example, the headset 400 (see
At 2604, an encoded stereo signal is received. The encoded stereo signal may correspond to the parametric binaural signal. The encoded stereo signal includes a stereo signal and presentation transformation information. The presentation transformation information relates the stereo signal to a binaural signal. For example, the system 2300 (see
At 2606, the encoded stereo signal is decoded to generate the stereo signal and the presentation transformation information. For example, the decoder block 1760 (see
At 2608, presentation transformation is performed on the stereo signal using the presentation transformation information to generate the binaural signal and acoustic environment simulation input information. For example, the presentation transformation block 1762 (see
At 2610, acoustic environment simulation is performed on the acoustic environment simulation input information to generate acoustic environment simulation output information. For example, the acoustic environment simulator 1764 (see
At 2612, the binaural signal and the acoustic environment simulation output information are combined to generate a combined signal. For example, the mixer 1766 (see
At 2614, the combined signal is modified using the headtracking data to generate an output binaural signal. For example, the matrixing block 2306 (see
At 2616, the output binaural signal is output. The output binaural signal may be output by at least two speakers. For example, the headset 400 (see
The method 2600 may include further steps or substeps, e.g. to implement other of the features discussed above regarding
At 2702, headtracking data is generated. The headtracking data relates to an orientation of a headset. A sensor may generate the headtracking data. For example, the headset 400 (see
At 2704, an encoded stereo signal is received. The encoded stereo signal may correspond to the parametric binaural signal. The encoded stereo signal includes a stereo signal and presentation transformation information. The presentation transformation information relates the stereo signal to a binaural signal. For example, the system 2400 (see
At 2706, the encoded stereo signal is decoded to generate the stereo signal and the presentation transformation information. For example, the decoder block 1760 (see
At 2708, presentation transformation is performed on the stereo signal using the presentation transformation information to generate the binaural signal and acoustic environment simulation input information. For example, the presentation transformation block 1762 (see
At 2710, acoustic environment simulation is performed on the acoustic environment simulation input information to generate acoustic environment simulation output information. For example, the acoustic environment simulator 2408 (see
Optionally, the acoustic environment simulation output information ASout 1768 is modified according to the headtracking data. For example, the preprocessor 2402 (see
At 2712, the binaural signal is modified using the headtracking data to generate an output binaural signal. For example, the matrixing block 2406 (see
At 2714, the output binaural signal and the acoustic environment simulation output information are combined to generate a combined signal. For example, the mixer 2410 (see
At 2716, the combined signal is output. The combined signal may be output by at least two speakers. For example, the headset 400 (see
The method 2700 may include further steps or substeps, e.g. to implement other of the features discussed above regarding
At 2802, headtracking data is generated. The headtracking data relates to an orientation of a headset. A sensor may generate the headtracking data. For example, the headset 400 (see
At 2804, an encoded stereo signal is received. The encoded stereo signal may correspond to the parametric binaural signal. The encoded stereo signal includes a stereo signal and presentation transformation information. The presentation transformation information relates the stereo signal to a binaural signal. For example, the system 2500 (see
At 2806, the encoded stereo signal is decoded to generate the stereo signal and the presentation transformation information. For example, the decoder block 1760 (see
At 2808, presentation transformation is performed on the stereo signal using the presentation transformation information and the headtracking data to generate a headtracked binaural signal. The headtracked binaural signal corresponds to the binaural signal having been matrixed. For example, the presentation transformation block 2562 (see
At 2810, presentation transformation is performed on the stereo signal using the presentation transformation information to generate acoustic environment simulation input information. For example, the presentation transformation block 2562 (see
At 2812, acoustic environment simulation is performed on the acoustic environment simulation input information to generate acoustic environment simulation output information. For example, the acoustic environment simulator 2408 (see
Optionally, the acoustic environment simulation output information ASout 1768 is modified according to the headtracking data. For example, the preprocessor 2402 (see
At 2814, the headtracked binaural signal and the acoustic environment simulation output information are combined to generate a combined signal. For example, the mixer 2410 (see
At 2816, the combined signal is output. The combined signal may be output by at least two speakers. For example, the headset 400 (see
The method 2800 may include further steps or substeps, e.g. to implement other of the features discussed above regarding
At 2902, headtracking data is generated. The headtracking data relates to an orientation of a headset. A sensor may generate the headtracking data. For example, the headset 400 (see
At 2904, an encoded stereo signal is received. The encoded stereo signal may correspond to the parametric binaural signal. The encoded stereo signal includes a stereo signal and presentation transformation information. The presentation transformation information relates the stereo signal to a binaural signal. For example, the system 2300 (see
At 2906, the encoded stereo signal is decoded to generate the stereo signal and the presentation transformation information. For example, the decoder block 1760 (see
At 2908, presentation transformation is performed on the stereo signal using the presentation transformation information to generate the binaural signal. For example, the presentation transformation block 1762 (see
At 2910, the binaural signal is modified using the headtracking data to generate an output binaural signal. For example, the matrixing block 2306 (see
At 2912, the output binaural signal is output. The output binaural signal may be output by at least two speakers. For example, the headset 400 (see
Note that as compared to the method 2600 (see
Implementation Details
An embodiment may be implemented in hardware, executable modules stored on a computer readable medium, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the steps executed by embodiments need not inherently be related to any particular computer or other apparatus, although they may be in certain embodiments. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, embodiments may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a non-transitory computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. (Software per se and intangible or transitory signals are excluded to the extent that they are unpatentable subject matter.)
The above description illustrates various embodiments of the present invention along with examples of how aspects of the present invention may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the present invention as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents will be evident to those skilled in the art and may be employed without departing from the spirit and scope of the invention as defined by the claims.
Number | Date | Country | Kind |
---|---|---|---|
16175495 | Jun 2016 | EP | regional |
This application claims priority from U.S. App. No. 62/352,685 filed Jun. 21, 2016; European Patent App. No. 16175495.7 filed 21 Jun. 2016 and U.S. Patent App. No. 62/405,677 filed 7 Oct. 2016 which are hereby incorporated by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2017/038372 | 6/20/2017 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2017/223110 | 12/28/2017 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
4060696 | Iwahara | Nov 1977 | A |
5917916 | Sibbald | Jun 1999 | A |
6243476 | Gardner | Jun 2001 | B1 |
6442277 | Lueck | Aug 2002 | B1 |
8229143 | Bharitkar | Jul 2012 | B2 |
9237398 | Algazi | Jan 2016 | B1 |
20020151996 | Wilcock | Oct 2002 | A1 |
20030076973 | Yamada | Apr 2003 | A1 |
20030210800 | Yamada | Nov 2003 | A1 |
20060045294 | Smyth | Mar 2006 | A1 |
20060062410 | Kim | Mar 2006 | A1 |
20080008327 | Ojala | Jan 2008 | A1 |
20080008342 | Sauk | Jan 2008 | A1 |
20080031462 | Walsh | Feb 2008 | A1 |
20080056517 | Algazi | Mar 2008 | A1 |
20080298610 | Virolainen | Dec 2008 | A1 |
20100328423 | Etter | Dec 2010 | A1 |
20110268281 | Florencio | Nov 2011 | A1 |
20110286614 | Hess | Nov 2011 | A1 |
20130064375 | Atkins | Mar 2013 | A1 |
20140064526 | Otto | Mar 2014 | A1 |
20150304791 | Crockett | Oct 2015 | A1 |
20150382130 | Connor | Dec 2015 | A1 |
20160269849 | Riggs | Sep 2016 | A1 |
20170295446 | Thagadur Shivappa | Oct 2017 | A1 |
20170353812 | Schaefer | Dec 2017 | A1 |
Number | Date | Country |
---|---|---|
20130263871 | Jan 2014 | AU |
102597987 | Jul 2012 | CN |
104919820 | Sep 2015 | CN |
0438281 | Jul 1991 | EP |
0762803 | Mar 1997 | EP |
2357854 | Aug 2011 | EP |
2007006432 | Jan 2007 | JP |
2012-070135 | Apr 2012 | JP |
2012-151529 | Aug 2012 | JP |
1997037514 | Oct 1997 | WO |
9951063 | Oct 1999 | WO |
2004039123 | May 2004 | WO |
2006024850 | Mar 2006 | WO |
2007110187 | Oct 2007 | WO |
2007112756 | Oct 2007 | WO |
2008006938 | Jan 2008 | WO |
2009046223 | Apr 2009 | WO |
2010036321 | Apr 2010 | WO |
2010141371 | Dec 2010 | WO |
2011135283 | Nov 2011 | WO |
2013181172 | Dec 2013 | WO |
2014035728 | Mar 2014 | WO |
2014145133 | Sep 2014 | WO |
2014194088 | Dec 2014 | WO |
2015066062 | May 2015 | WO |
2015108824 | Jul 2015 | WO |
Entry |
---|
Tikander, M. “Acoustic Positioning and Head Tracking Based on Binaural Signals” AES Convention, May 2004, pp. 1-10. |
Zotkin, D.N. “Efficient Conversion of X.Y Surround Sound Content to Binaural Head-Tracked Form for HRTF-Enabled Playback” IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr. 15-20, 2007. |
Hess, Wolfgang “Head-Tracking Techniques for Virtual Acoustics Applications” AES Convention Paper 8782 presented at the 133rd Convention, Oct. 26-29, 2012, pp. 1-15. |
Mannerheim, P. et al “Image Processing Algorithms for Listener Head Tracking in Virtual Acoustics” Institute of Acoustics Spring Conference Futures in Acoustics, Dec. 1, 2006, pp. 114-123. |
Shoji, Seiichiro “Efficient Individualisation of Binaural Audio Signals” The University of York, 2007. |
McKeag, Adam et al “Sound Field Format to Binaural Decoder with Head Tracking” AES Convention, Aug. 1, 1996. |
Zhang, C. et al “Dynamic Binaural Reproduction of 5.1 Channel Surround Sound with Low Cost Head-Tracking Module” AES Conference 55th International Conference, Aug. 2014. |
Algazi, V. Ralph, et al “Motion-Tracked Binaural Sound for Personal Music Players” AES Presented at the 119th convention, Oct. 7-10, 2005, New York, USA, pp. 1-8. |
Algazi, V. Ralph, et al “High-Frequency Interpolation for Motion-Tracked Binaural Sound” AES Convention 121, Oct. 2006. |
Melick, J.B. “Customization for Personalized Rendering of Motion-Tracked Binaural Sound” AES Convention Paper 5225, presented at the 117th Convention, Oct. 28-31, 2004, San Francisco, CA USA. |
Faller, C. et al “Binaural Audio with Relative and Pseudo Head Tracking” AES Convention, 138, May 2015, pp. 1-8. |
Laitinen, Mikko-Ville, et al “Influence of Resolution of Head Tracking in Synthesis of Binaural Audio” AES presented at the 132nd Convention, Apr. 26-29, 2012, Budapest, Hungary, pp. 1-8. |
Winter, et al “Localization Properties of Data-Based Binaural Synthesis Including Translatory Head-Movements” University of Rostock, Jan. 1, 2014. |
Algazi, V. Ralph, et al “Motion-Tracked Binaural Sound” Journal of the Audio Engineering Society, Nov. 1, 2004, pp. 1142-1156. |
Algazi, V. Ralph, et al “Effective Use of Psychoacoustics in Motion-Tracked Binaural Audio” IEEE International Symposium on Multimedia, Dec. 15, 2008. |
Takeuchi, T. “Optimal Source Distribution for Binaural Synthesis Over Loudspeakers” Journal of the Acoustical Society of America, 112.6, Dec. 2002, pp. 2786-2797. |
Parodi, Y. et al “A Subjective Evaluation of the Minimum Channel Separation for Reproducing Binaural Signals Over Loudspeakers” JAES vol. 59 Issue 7/8, pp. 487-497, Jul. 2011. |
Avendano, C. et al., “Ambience extraction and synthesis from stereo signals for multi-channel audio up-mix”, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing Year : 2002, vol. 2, pp. II-1957-II-1960. |
Rao, H., et al., “A joint minimax approach for binaural rendering of audio through loudspeakers”, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings 1 (Aug. 20, 2007): 1173-1176. |
Sugaya, M. et al., “Method of designing inverse system for binaural reproduction over loudspeakers by using diagonalization method”, 2015 IEEE Conference on Control Applications (CCA). Proceedings (2015): 1032-7;609. |
Parodi, Y., “A Subjective Evaluation of the Minimum Audible Channel Separation in Binaural Reproduction Systems trough Loudspeakers”, AES Convention: 128 (May 2010) Paper No. 8104. |
Delikaris-Manias, S., “Binaural Reproduction Over Loudspeakers Using In-situ Measurements of Real Rooms—A Feasibility Study”, AES Conference:35th International Conference: Audio for Games (Feb. 2009). |
Matsui, K., “Binaural Reproduction over Loudspeakers Using Low-Order Modeled HRTFs”, AES Convention:137 (Oct. 2014) Paper No. 9128. |
Myung-Suk, S. et al., “Enhanced Binaural Loudspeaker Audio System with Room Modeling”, Oct. 2010. |
Edgar Y, C., “Optimal crosstalk cancellation for binaural audio with two loudspeakers”, Princeton University, BACCH Paper. |
Saebo, Asbjorn “Effect of Early Reflections in Binaural Systems with Loudspeaker Reproduction” Proc. 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York, Oct. 17-20, 1999, pp. W991-W994. |
Schulein, B. et al., “The Design, Calibration, and Validation of a Binaural Recording and playback system for Headphone and Two-speaker 3D-Audio Reproduction”, AES Convention: 137 (Oct. 2014) Paper No. 9130. |
Papadopoulos, T. et al., “Inverse Filtering for Binaural Audio Reproduction Using Loudspeakers—Potential and Limitations”, Proceedings—Institute of Acoustics, 30.2 (2008) p. 45. |
Huang, Y. et al., “On crosstalk cancellation and equalization with multiple loudspeakers for 3-D sound reproduction”, IEEE Signal Processing Letters Year: 2007, vol. 14, Issue: 10, pp. 649-652. |
Mannerheim, P. et al., “Virtual Sound Imaging Using Visually Adaptive Loudspeakers”, Acta Acustica United With Acustica 94.6 (Nov. 2008-Dec. 2008): 1024-1039. |
Nawfal, I. et al., “Perceptual Evaluation of Loudspeaker Binaural Rendering Using a Linear Array”, AES Convention:137(Oct. 2014) Paper No. 9151. |
Lopez, J. et al., “Modeling and Measurement of Cross-talk Cancellation Zones for Small Displacements of the Listener in Transaural Sound Reproduction with Different Loudspeaker arrangements”, AES Convention: 109 (Sep. 2000) Paper No. 5267. |
Lopez, J. et al., “Experimental evaluation of cross-talk cancellation regarding loudspeakers angle of listening”, IEEE Signal Processing Letters 8.1 (Feb. 20, 2001): 13-15. |
Breebaart, j. et al “Parametric binaural synthesis: background, applications and standards”, in Proceedings of the NAG-DAGA (2009). |
C.P. Brown and R.O. Duda, “A Structural Model for Binaural Sound Synthesis”, in IEEE Transactions on Speech and Audio Processing, 6(5):476-488 (Sep. 1998). |
Julia Jakka, “Binaural to Multichannel Audio Upmix”, Master's Thesis (Helsinki University of Technology, 2005). |
Lord Rayleigh, “On Our Perception of Sound Direction”, in Philosophical Magazine, 13:214-232 (J.W. Strutt, 1907). |
R.S. Woodworth and G. Schlosberg, Experimental Psychology, pp. 349-361 (Holt Rinehard and Winston, New York, 1954). |
Vinton, M. et al “Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications”, AES 57th International Conference (Mar. 6-8, 2015). |
Number | Date | Country | |
---|---|---|---|
20190327575 A1 | Oct 2019 | US |
Number | Date | Country | |
---|---|---|---|
62405677 | Oct 2016 | US | |
62352685 | Jun 2016 | US |