The present invention relates to a method and apparatus for providing subject specific digital audio data and a subject specific digital audio profile. In particular, but not exclusively, the present invention relates to the near-field acoustic measurement of Head Related Transfer-Functions (HRTFs) of a subject (e.g. a person, a dummy mannequin, or an anthropomorphic model) to provide a binaural Ambisonics profile for that subject.
It can be said that the physical characteristics of a person affect how they perceive sound. Example physical characteristics include, but at are not limited to, the size, shape, and composition of the person's torso, head, facial features, and ears. Consequently, when creating or recreating an audio experience for a person, one may wish to account for their physical characteristics to make the experience more immersive and realistic for that person.
HRTFs quantify the cumulative effect of such physical characteristics on the perception of sound incoming from a given point in space relative to a listener across a band of frequencies. By convolving an audio signal with an HRTF, an audio signal can be transformed to behave as though it has been modified by a person's relevant physical characteristics. For example, after an appropriate mathematical operation involving an HRTF, or a set of HRTFs, associated with the same given point, a sound can be transformed to a form that, if played back over a pair of earphones or headphones, would sound to a listener as though the origin of the sound was that given point.
One subject area in which HRTFs are utilised is the field of binaural audio. Binaural audio involves simulating a three-dimensional soundfield directly at the ears of a listener.
The HRTFs used in research and industry are what is may be referred to as ‘non-individual’, i.e. they are HRTFs determined from a dummy mannequin or anthropomorphic model constructed to represent some average of the relevant physical characteristics of a population. This one-size-fits-all approach can lead to an unsatisfactory audio experience for a listener and is often unsuitable for applications where a high degree of immersion and/or accuracy of sound localisation is required.
‘Personal’ or ‘individual’ HRTFs can be determined from the output data of a microphone, located on or within the ear of a person, responsive to impulses of given frequencies, or a signal representative thereof, transmitted by a loudspeaker element at a predetermined location. Creation of such personal HRTFs conventionally require loudspeaker arrays supported over a considerable distance from a subject. This makes it costly to provide and inconvenient for a user/subject to access.
Despite the disadvantages of non-individual HRTFs, they remain in use at least in part due to the above-mentioned impracticalities and cost-prohibitive nature of existing solutions for providing HRTF acoustic measurements of an individual. For example, these existing solutions for providing subject specific HRTFs require a large (and expensive) loudspeaker array and the measurement process is uncomfortable for the individual being measured. One reason that a large loudspeaker array is used to take acoustic measurements of an individual is due to the complications that arise when trying to measure HRTFs in the near-field as a result of the effects of distance on the properties of a propagating sound wave.
It is an aim of the present invention to at least partly mitigate one or more of the above-mentioned problems.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for taking HRTF near-field acoustic measurements of a specific subject and for providing subject specific audio data.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for taking HRTF near-field acoustic measurements of a specific subject with greater proximity to the subject than conventional solutions allow.
It is an aim of certain embodiments of the present invention to provide apparatus for taking HRTF acoustic measurements of a subject that is cheaper and more convenient to transport and construct and of a smaller physical footprint than conventional solutions allow.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for providing a subject specific binaural Ambisonic renderer determined from acoustic measurements taken in a near-field regime.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for providing a personal binaural Ambisonic renderer determined from acoustic measurements taken in a near-field regime.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for providing a binaural Ambisonic renderer determined from acoustic measurements of a subject taken in a near-field regime.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for providing HRTFs exhibiting the distance-related characteristics of far-field HRTF data from near-field HRTF data.
It is an aim of certain embodiments of the present invention to provide apparatus and a method for providing HRTFs.
It is an aim of certain embodiment of the present invention to provide a subject audio data profile determined from acoustic measurements taken in the near field regime.
It is an aim of certain embodiment of the present invention to provide a personal audio data profile determined from acoustic measurements taken in the near field regime.
It is an aim of certain embodiment of the present invention to provide a subject audio data profile for enabling more immersive and realistic binaural audio experiences.
According to a first aspect of the present invention there is provided apparatus for providing subject specific digital audio data, comprising:
Aptly the subject specific digital audio data comprises data that represents a superposition of sound, from the plurality of effective point sources of the loudspeaker elements, at the aural cavity responsive to at least one physical characteristic of the subject.
Aptly each subject specific audio data output comprises a digital or analogue representation of a physical reverberation of an active element of the respective microphone element responsive to a superposition of sound, including sound from the plurality of effective point sources of the loudspeaker elements, at the active element.
Aptly said a distance is selected to provide a wave front of sound from any one of the loudspeaker elements, at each aural cavity, that is not effectively planar.
Aptly said a distance is selected to provide a near field sound wave provided by a superposition of sound, including sound from the plurality of effective point sources of the loudspeaker elements, at each aural cavity.
Aptly the superposition of sound from the loudspeaker elements at each aural cavity is sufficiently complex that subsequent processing of the subject specific audio data output requires at least one Ambisonic processing step.
Aptly each aural cavity of a subject comprises a sound receiving orifice opening into a channel; and supporting flesh or flesh imitating material surrounding the orifice and the channel.
Aptly each subject comprises at least one physical characteristic responsive to a shape and size of the orifice and the channel and/or a density, surface texture and/or layering of the supporting flesh or flesh imitating material.
Aptly the imaginary surface comprises a hemisphere or a portion of a hemisphere or a cylinder or a portion of a cylinder or a combined surface that includes a full or partial hemisphere portion and a full or partial cylindrical portion.
Aptly the subject is a person, or a dummy mannequin, or an anthropomorphic model.
Aptly the apparatus further comprises an alignment system for aligning the subject with respect to a predetermined location determined by the predetermined spatial relationship.
Aptly the alignment system comprises at least one visual display.
Aptly the alignment system comprises at least one video camera device.
Aptly the alignment system comprises at least one laser.
Aptly the visual display is responsive to at least one video camera device and/or at least one laser.
Aptly a position of at least one of the plurality of loudspeaker elements is adjustable responsive to a determined height of the subject.
Aptly the apparatus further comprises at least one linear actuator for adjusting a position of at least one of the plurality of loudspeaker elements responsive to a determined height of the subject.
Aptly the apparatus further comprises at least one panel or body of sound-dampening material proximate to the support.
Aptly at least a first group of the loudspeaker elements is connected to a further group of the loudspeaker elements via a hinged connection that allows the first group to be selectively located with respect to the further group.
Aptly the plurality of loudspeaker elements is free-standing.
Aptly the loudspeaker elements are supported via a support and the support comprises a modular rig.
Aptly the loudspeaker elements are supported via a support and the support is portable.
Aptly each said respective audio signal input is representative of an impulsive input.
Aptly the subject specific digital audio data comprises an analogue-to-digital conversion of a respective subject specific analogue audio data output.
Aptly the subject specific digital audio data comprises binaural subject specific digital audio data.
Aptly the subject specific digital audio data comprises data representative of at least one Head Related Transfer Function (HRTF).
Aptly the processed subject specific audio data comprises data representative of at least one near-field Head Related Transfer Function (HRTF).
Aptly the processed subject audio data comprises data representative of at least one near-field compensated (NFC) Head Related Transfer Function (HRTF).
Aptly the subject specific digital audio data comprises data representative of at least one synthesised far-field Head Related Transfer Function (HRTF).
Aptly the subject specific digital audio data comprises a binaural Ambisonic renderer.
Aptly the binaural Ambisonic renderer is a personal binaural Ambisonic renderer.
Aptly the apparatus further comprises a control interface for receiving user input.
Aptly the predetermined spatial relationship is a spatial relationship predetermined from a regular 2-dimensional shape or a regular 3-dimensional shape.
Aptly the predetermined spatial relationship is determined from a Lebedev grid distribution.
According to a second aspect of the present invention there is provided a method for determining subject specific digital audio data, comprising:
Aptly the method further comprises providing the subject specific digital audio data as data that represents a superposition of sound at the aural cavity responsive to at least one physical characteristic of the subject.
Aptly the method further comprises providing the subject specific audio data output as a digital or analogue representation of a physical reverberation of an active element of a respective microphone element responsive to a superposition of sound at the active element.
Aptly the method further comprises locating a subject that comprises a person or a dummy mannequin or an anthropomorphic model in a spatial region that is at least partially contained by an imaginary surface in which an effective point source of each loudspeaker element lies.
Aptly the method further comprises prior to or subsequent to locating the subject in the spatial region, adjusting a height of at least one loudspeaker element with respect to a floor surface via which the subject is located.
Aptly the method further comprises providing to each loudspeaker element as an impulse signal or a signal representative of an impulse, respective audio signal inputs.
Aptly the method further comprises converting the subject specific audio data output via an analogue-digital conversion step thereby providing the subject specific digital audio data.
Aptly the method further comprises providing at least one near field compensated (NFC) Head Related Transfer Function (HRTF) via application of a near field compensation audio processing step to the subject specific audio data output.
Aptly the method further comprises modifying at least one NFC HRTF and providing at least one synthesised far-field HRTF.
Aptly the method further comprises formatting a suitable collection of HRTFs and providing a subject specific binaural Ambisonic renderer.
Aptly the predetermined spatial relationship is a spatial relationship predetermined from a regular 2-dimensional shape or a regular 3-dimensional shape.
Aptly the predetermined spatial relationship is determined from a Lebedev grid distribution.
According to a third aspect of the present invention there is provided a subject specific digital audio profile, determined from at least one analogue audio data output provided by at least one microphone element located on or within at least one aural cavity of a subject, that comprises a subject specific Ambisonics renderer that modifies digital audio input data according to at least one physical characteristic of a subject and provides personalised audio data output responsive thereto, wherein:
Aptly the subject is a person, or a dummy mannequin, or an anthropomorphic model.
Aptly each said respective audio signal input is representative of an impulsive input.
Aptly the subject digital audio data comprises an analogue-to-digital conversion of the respective subject analogue audio data.
Aptly the subject audio data comprises binaural subject digital audio data.
Aptly the subject digital audio data comprises data representative of at least one Head Related Transfer Function (HRTF).
Aptly the subject digital audio data comprises data representative of at least one near-field Head Related Transfer Function (HRTF).
Aptly the subject digital audio data comprises data representative of at least one near-field compensated (NFC) Head Related Transfer Function (HRTF).
Aptly the subject digital audio data comprises at least one synthesised far-field Head Related Transfer Function (HRTF).
Aptly the predetermined spatial relationship is a spatial relationship predetermined from a regular 2-dimensional shape or a regular 3-dimensional shape.
Aptly the predetermined spatial relationship is determined from a Lebedev grid distribution.
Certain embodiments of the present invention provide acoustic measurements of a subject and subject digital audio data at a lower cost and/or with greater convenience than existing solutions.
Certain embodiments of the present invention provide apparatus, for providing acoustic measurements of a subject and subject audio data, that occupies a reduced footprint and/or physical space than existing solutions.
Certain embodiments of the present invention provide a method that provides subject digital audio data determined from acoustic measurements of a subject taken within greater proximity to the subject than existing solutions.
Certain embodiments of the present invention provide HRTF data exhibiting the distance-related characteristics of far-field HRTF data from near-field HRTF data.
Certain embodiments of the present invention provide a subject specific digital audio profile for enabling more immersive and realistic binaural audio experiences.
Certain embodiments of the present invention provide a personal digital audio profile for enabling more immersive and realistic binaural audio experiences.
Certain embodiments of the present invention provide a personal audio filter that affects the sound localisation characteristics of a sound according to the physical characteristics of a person.
Certain embodiments of the present invention provide a subject specific audio filter that affects the sound localisation characteristics of a sound according to the physical characteristics of the subject.
Certain embodiments of the present invention provide a loudspeaker array arranged according to a regular or approximately regular grid distribution that is height-adjustable located proximate to a subject.
Embodiments of the present invention will now be described hereinafter, by way of example only, with reference to the accompanying drawings in which:
In the drawings like reference numerals refer to like parts.
Linear actuators 140 can adjust the height of the support structure 120 suitable for a person 150 to stand (or, if appropriate, sit) inside the acoustic chamber 100. A first portion 170a and second portion 170b of the support structure 120 are each connected to the remainder of the support structure 120 via hinges, allowing the first and second portions 170a, 170b to swing outwards, suitable for a person 150 to walk into the acoustic chamber 100.
A display 180 comprises a part of a self-alignment system that gives feedback to the person 150 so that the person 150 can align himself at a predetermined reference point in the acoustic chamber 100. The self-alignment system further comprises at least one video camera that provides a video feed to the display 180 that can be overlaid with visual instructions on the display 180 that tell the person 150 how to adjust themselves within the chamber. Optionally, the self-alignment system further comprises at least one laser which measures the distance of a respective location of the person 150 from the laser.
At least one ear 190 of the person 150 is located within the acoustic chamber 100. Depending on the particular set of acoustic measurements that are desired, the combination of the signals transmitted by the loudspeakers 110 can generate a sweet spot centred in proximity to the centre of the head of the person 150, a sweet spot centred in proximity to the orifice of one ear 190, or two sweet spots each centred respectively in proximity to the opening of each of two ears of the person 150. Ear-locatable microphones are located on or within at least one ear 190. The ear-locatable microphones record sound transmitted by the loudspeakers 110 after the sound has been affected (e.g. via reflection, diffraction, and refraction) by the physical characteristics of the person 150. Example physical characteristics include the size, shape, and composition of the body, torso, head, facial features, and ears of the person 150. Optionally, ‘composition’ may refer to the density and/or surface texture and/or layering of flesh or flesh imitating material. The acoustic chamber 100 is of a size such that when the person 150 is aligned at the centre of the acoustic chamber 100, the loudspeakers 110 mounted to that support structure 120 are at sufficiently close distance to the person 150 such that the wave fronts of sound waves transmitted by the loudspeakers 110 are effectively non-planar. Such a distance may be referred to as ‘near field’. In the ‘near field’ of a subject, small changes in the distance of the subject to a source are perceptually relevant. Aptly, the near-field represents a region of space close to the head of a subject/listener such that the wave front curvature of a sound wave are perceptually significant.
It will be understood that instead of a person 150, a dummy mannequin or anthropomorphic model can be located in the acoustic chamber 100 and microphones can be located on or within at least one artificial ear/aural cavity. An ear or an artificial ear is an example of an aural cavity.
It will be understood that the acoustic chamber 100 shown in
It will be understood that the acoustic chamber 100 has an associated imaginary surface that the size and shape of the support structure 120 resembles. For example, the acoustic chamber as illustrated in
In
It will be understood that sound-dampening material, such as acoustic foam, can be mounted to the outside of the acoustic chamber 100 and/or between the beams of the support structure 120 and/or positioned externally to at least partially surround the acoustic chamber 100. By mounting acoustic foam to, or in proximity to, the acoustic chamber 100, external noise can be reduced increasing the quality of acoustic measurements determined using the acoustic chamber 100.
Aptly, the acoustic chamber 100 provides apparatus for providing subject specific digital audio data. The acoustic chamber includes a plurality of loudspeaker elements 200, each of these is responsive to at least one respective audio signal input and is supported in a predetermined spatial relationship in which respective locations of an effective point source of each loudspeaker element 200 all lie in an imaginary surface that at least partially contains a spatial region where a subject 150 comprising at least one aural cavity 190 is locatable. At least one microphone element is locatable on or within an aural cavity 190 of the subject 150, for providing a respective subject specific audio data output responsive to at least one physical characteristic of the subject and an audio signal output from at least one of the loudspeaker elements 200. An audio processing element can be included for processing the subject specific audio data output and providing subject specific digital audio data for said subject 150, responsive thereto. A distance between each respective location and each aural cavity 190 is less than about 1.5 metres.
Aptly, a distance between each respective location and each aural cavity 190 is about 1.5 metres. Aptly, a distance between each respective location and each aural cavity 190 is less than about 1.45 metres; or less than about 1.4 metres; or less than about 1.35 metres; or less than about 1.3 metres; or less than about 1.25 metres; or less than about 1.2 metres; or less than about 1.15 metres; or less than about 1.1 metres; or less than about 1.05 metres; or less than about 1 metre; or less than about 0.95 metres; or less than about 0.9 metres; or is any value selected from these ranges; or any sub-range constructed from the values contained within any of these ranges. Aptly, each respective location is within the near-field of each aural cavity 190. Aptly, at least one respective location is within the near-field of each aural cavity 190. Aptly, at least two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, or sixteen respective locations are within the near-field of each aural cavity 190. Aptly, the acoustic chamber is adjustable to a height of 2 metres. Aptly the acoustic chamber is adjustable to a height of less than 2 metres. Aptly, the acoustic chamber is adjustable to a height of up to 2 metres. Aptly, the acoustic chamber is adjustable to height of, or above, 1 metre. Aptly, the acoustic chamber is adjustable to a height up to 1.5 metres; or up to 1.55 metres; or up to 1.6 metres; or up to 1.65 metres; or up to 1.7 metres; or up to 1.75 metres; or up to 1.8 metres; or up to 1.85 metres; or up to 1.9 metres; or up to 1.95 metres.
In step 410, the subject is aligned relative to a reference point within the acoustic measurement chamber. The reference point is determined by the predetermined relationship according to which the acoustic measurement chamber is arranged. Optionally, the reference point is at a known location relative to a predicted sweet spot that may be generated by the loudspeakers 110 of an acoustic chamber 100. At least an aural cavity (and optionally two) is located so that it is contained within an imaginary surface that contains the multiple loudspeaker effective point sources.
The alignment step 410 may involve manual assistance and/or a self-alignment system. The self-alignment system may comprise at least one display connected to at least one video camera device. Optionally, the self-alignment system comprises at least one laser. Each laser can provide measurements of the distance of a part of the subject to the respective laser. The at least one display may display real-time video footage of the subject to the subject or to an external observer. The video camera devices and the displays may also be connected to a processing unit to provide simultaneously to the subject an overlay with real-time footage, so a subject or an external observer can more easily see the location of the head of the subject relative to the reference point. Adjusting the height of the acoustic chamber relative to the subject and aligning the subject relative to a reference point in the acoustic chamber can improve the accuracy of the acoustic measurements, and therefore the quality of the products of the audio processing of the acoustic measurements. A processing unit is a computing device capable of processing the video feeds of at least one video camera device and providing output to a display that shows real-time data to a subject or an external observer indicating a current position of the subject relative to the reference point. Optionally, a processing unit is a desktop computer, laptop computer, tablet, smartphone, server, or cloud computer. Optionally, the processing unit is capable of receiving data input, from at least one laser, that includes the distance of a part of the subject relative to the respective laser and providing output to a display responsive to the data input to aid the subject in the alignment process.
In step 420, at least one microphone element is placed on or within at least one ear or artificial ear or aural cavity of the subject.
In step 430, a first predetermined audio signal is played back through at least one of the loudspeaker elements. The predetermined audio signal may be an impulse of a particular frequency or a sinusoidal sweep of multiple frequencies that is inclusive thereof. A sinusoidal sweep of frequencies is an audio signal comprising a sinusoidal wave that progressively increases in frequency at a predetermined rate between a predetermined range of frequencies. Responsive to the predetermined audio signal and the physical characteristics of the subject, an audio signal (i.e. the HRTF associated with the first loudspeaker at given location) is captured by the at least one microphones and is recorded, in a digital data form, to a memory unit. This step is then repeated for as many impulse (or signal representative thereof) and loudspeaker element (of a particular location) combinations as desired. If the predetermined audio signal is a sinusoidal sweep of multiple frequencies, there may be a further step wherein a deconvolution technique is applied to the captured audio signal to determine an impulse-equivalent response of audio stimuli to the physical characteristics of the subject. A sinusoidal sweep may also be referred to as a sine sweep. Aptly, the deconvolution technique comprises a deconvolution step whereby the recorded signal is convolved with an inverted copy of the sine sweep in order to effectively simulate an impulsive stimuli.
As illustrated in
The Ambisonic audio file provides a surround-sound format that allows for the reproduction of a soundfield via an arbitrary loudspeaker layout, so long as there are a sufficient number of loudspeakers comprising the layout and, for a given number of loudspeakers, the loudspeakers are suitably arranged so that the signals from the loudspeakers appropriately interfere at a desired listening location. Via the steps in accordance with the present invention, a soundfield is decomposed into a component form based on the special mathematical functions known as ‘spherical harmonics’. By representing a soundfield in this way, certain transformations of the soundfield, such as rotational transformations, can be computed efficiently due to the natural mathematical symmetries of spherical harmonics.
For a given order of an Ambisonic format, it is the components of the decomposed soundfield that are decoded to generate the signals that are sent each loudspeaker in a respective loudspeaker layout. The ‘order’ of an Ambisonics format is determined from the number of components into which a soundfield is decomposed.
Certain embodiments of the present invention provide a subject specific binaural Ambisonics renderer determined from near-field acoustic measurements of the subject. One advantage of the provided subject specific (e.g. a personal) Ambisonics renderer is that it provides a listener/user the benefit of lower-latency audio processing and a higher accuracy of sound localisation over conventional solutions. One area in which this is useful is when the head of the listener/user is being tracked in space and the head movements (e.g. rotational head movements) affect the sounds that the listener/user hears. This is useful in the context of professional computer gaming (which may also be referred to as ‘eSports’), for example, as a player who can more precisely and more quickly locate the source of an in-game sound has an advantage over his competitors.
In this context, the group delay of an audio signal is the time delay introduced during the reproduction of the audio signal into sound for the component frequencies of the audio signal.
In
In step 1020, a Low-Pass Filter (LPF) effect is applied to each set of intermediate HRTFs that attenuates the amplitude of frequencies in the HRTF signal above the cross over frequency, producing a first set of time-aligned HRTFs for each ear.
In step 1030, a second copy of the head-centred HRTFs are time-aligned by introducing a fixed time delay for all frequencies, producing a second set of intermediate HRTFs for each ear, where the time delay is calculated according to the location of each ear relative to the loudspeakers.
In step 1040, a High-Pass Filter (HPF) effect is applied to the intermediate HRTFs that attenuates the amplitude of frequencies in the HRTF signal below the cross over frequency, producing a second set of time-aligned HRTFs for each ear.
In step 1050, the first and second sets of time-aligned HRTFs are combined for each ear, respectively, producing what is referred to as ‘hybrid HRTFs’ for each ear. These hybrid HRTFs for each ear can also be packaged together into a single set of stereophonic hybrid HRTFs. Optionally, the first and second set of time-aligned HRTFs are combined via a linear phase crossover filter effect. It will be understood that alternative methods could be used to combine these two sets of HRTFs.
In
In step 1110, an incremental time delay is introduced to the head-centred HRTFs, producing a first set of intermediate HRTFs for each ear, for example according to the curve 730, to a first copy of the head-centred HRTFs, up to the cross over frequency 720. The time delay introduced is negligible for frequencies below the first-increment frequency 750. In general, the time delay introduced can be determined individually for each HRTF because each HRTF corresponds to a particular location relative to the subject measured.
In step 1120, a Low-Pass Filter (LPF) effect is applied to the intermediate HRTFs for each ear that attenuates the amplitude of frequencies in the HRTF signal above the cross over frequency, producing a first set of time-aligned HRTFs for each ear.
In step 1130, BiRADIAL HRTFs for each ear of a subject are obtained, for example according to the method steps as shown in
In step 1150, the truncated BiRADIAL HRTFs and time-aligned HRTFs are combined for each ear, respectively, producing another example of hybrid HRTFs for each ear. These hybrid HRTFs for each ear can also be packaged together to form stereophonic hybrid HRTFs. Optionally, the first and second set of time-aligned HRTFs are combined via linear phase crossover filter effect. It will be understood that alternative methods could be used to combine these two sets of HRTFs.
In step 1210, the near-field time-aligned HRTFs or near-field BiRADIAL HRTFs are distance-compensated and encoded into a spherical harmonic format.
Conventionally, certain Ambisonics techniques are based on the assumption of plane wave theory, mathematically encoding a source into spherical harmonics assumes that the source has a planar wavefront. In accordance with the present invention, for acoustic measurements taken in the near-field, which thus involve sound waves having a non-planar wavefront, near-field compensation (NFC) steps are applied so the HRTFs are suitable for use in an Ambisonics renderer.
The Ambisonic components, βmiσ, of a plane wave signal, s, of incidence (φ,ϑ) may be defined:
βmiσ=s·Ymiσ(φ,ϑ) (1)
For a (radial) point source of position (φ,ε,rs) it is helpful to consider the near-field effect filter, Γm, such that:
βmiσ=S·Γm(rs)·Ymiσ(φ,ϑ) (2)
Γm(rs)=k·dref·hm−(krs)·j−(m+1) (3)
Where:
Equation 2 can be simplified into the following form:
Whereby Fm are the degree dependent transfer functions which model the near-field effect of a signal originating from the point (φ,ϑ,rs) having been measured from the origin. The filters apply a phase shift and bass-boost to sources as they approach the origin and have a greater effect on higher order components. The near-field properties of the original source and the reproduction loudspeaker are considered when applying NFC.
In step 1220, mathematical functions representing an audio impulse source are encoded into a spherical harmonic format for a set of frequencies and are convolved with the HRTFs provided via step 1210. Interaural Time Differences (ITDs) are determined for each HRIR from the position of the subject, of whom/which the acoustic measurements were taken, relative to the loudspeakers and the predetermined spatial relationship according to which the loudspeakers are arranged.
In step 1230, after introducing time delays, synthesised far-field (time-aligned or BiRADIAL) HRTFs are derived. Optionally, the synthesised far-field HRTFs are derived in a spherical harmonic format. The synthesised far-field HRTFs might also be referred to as far-field-equivalent HRTFs.
Aptly, near-field (time-aligned or BiRADIAL) HRTFs may be encoded into spherical harmonic format in the form of a binaural Ambisonic renderer and distance compensated.
Aptly, impulse input sources may also be encoded into spherical harmonic format. These may be convolved with the encoded time-aligned or BiRADIAL HRTFs (that form part of a binaural renderer) to produce synthesised far-field time-aligned or BiRADIAL HRTFs. However, time-aligned or BiRADIAL HRTFS can occasionally be limited in their use because they may not reproduce ITDs at low frequencies. Therefore, a time delay can be reintroduced at this point. This results in head-centred synthesised far-field HRTFs. These synthesised HRTFs may then be used in an Ambisonic renderer or indeed converted to hybrid HRTFs at this point for improved reproduction accuracy.
It will be understood that synthesised far-field hybrid HRTFs may be determined in accordance with the present invention. Synthesised far-field hybrid HRTFs may be determined from near-field hybrid HRTFs that may be encoded into a spherical harmonic format and distance compensated. Impulse input sources, which may also be encoded into a spherical harmonic format, may be convolved with the near-field hybrid HRTFs to produce synthesised far-field hybrid HRTFs.
In step 1310, near-field hybrid (BiRADIAL or time-aligned) HRTFs or synthesised far-field hybrid HRTFs are determined for the specific subject.
In step 1320, where appropriate, the HRTFs is provided via step 1310 are distance-compensated. The HRTFs are then integrated into a subject specific Ambisonics renderer. A subject specific Ambisonics renderer might also be referred to as a subject specific Ambisonics decoder or a subject specific Ambisonics profile.
In step 1330, the subject specific Ambisonics renderer is then provided to the user in an appropriate file format via an appropriate means, for example via electronic file transfer, email, cloud computer access, or providing headphones with the subject specific renderer inbuilt/on board.
In step 1340, the subject specific Ambisonics renderer can then be integrated into software, such as a music player, video player, web-browser, operating system, video game, video game engine, and the like, or (if appropriate) an application programming interface (API) thereof, executed on a computer, smart phone, cloud server, cloud server, and the like to provide a subject specific binaural audio experience for the subject.
The steps shown in
The steps shown in
The steps shown in
The steps shown in
The steps shown in
Throughout the description and claims of this specification, the words “comprise” and “contain” and variations of them mean “including but not limited to” and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integers, characteristics or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of the features and/or steps are mutually exclusive. The invention is not restricted to any details of any foregoing embodiments. The invention extends to any novel one, or novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
Number | Date | Country | Kind |
---|---|---|---|
1918010.8 | Dec 2019 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2020/053043 | 11/27/2020 | WO |