The disclosure relates to methods, devices, and systems for using sound externalization over headphones to augment reality.
All examples and features mentioned below can be combined in any technically possible way.
The present disclosure describes various systems and methods for sound source virtualization. When listening to audio content over near-field speaker systems, such as headphones, particularly stereo headphones, many listeners perceive the sound as coming from “inside their head”. Sound virtualization refers to the process of making sounds that are rendered over such systems sound as though they are coming from the surrounding environment, i.e. the sounds are “external” to the listener, which may be referred to herein as headphone externalization or sound externalization, additionally, the terms ‘externalization’ and virtualization’ may be used interchangeably herein. Alternately stated, the sounds may be perceived by the listener as coming from a virtual source. A combination of head tracking and the use of head related transfer functions (HRTFs) can be used to give the listener cues that help them perceive the sound as though it were coming from “outside their head”. The present disclosure appreciably recognizes that additional cues may be added that are consistent with the listener's surroundings to significantly improve the perceived externalization. More specifically, a radiation pattern of a virtual sound source and reflections from acoustically reflective surfaces such as the walls, ceiling, floor, and other objects in the room may be synthetically generated and appropriately filtered with corresponding HRTFs that are consistent with respect to the direction of arrival of direct and reflected audio signals.
In one example, a method for virtualizing sound from a speaker assembly proximate to a user is provided, the method including: receiving an audio signal associated with a first virtual sound source; receiving a virtual sound source location of the first virtual sound source; receiving a virtual sound source orientation of the first virtual sound source; adjusting the audio signal based at least in part on a radiation pattern characteristic of the first virtual sound source; adjusting the audio signal based at least in part on a head related transfer function (HRTF); and providing the adjusted audio signal at an output, the output adjusted audio signal to be provided to the speaker assembly for conversion into acoustic energy delivered to at least one of the user's ears.
In one aspect, the method further includes adjusting the audio signal based at least in part on an acoustically reflective characteristic of an acoustically reflective surface in proximity to the first virtual sound source.
In one aspect, the acoustically reflective characteristic is frequency dependent.
In one aspect, the radiation pattern characteristic includes a directional characteristic.
In one aspect, the radiation pattern characteristic includes a reflected directional characteristic based at least in part on a mirror sound source location selected based at least in part on the first virtual sound source location and a location of the acoustically reflective surface.
In one example, a method for virtualizing sound from a speaker assembly proximate a user is provided, the method including: associating a first virtual sound source with a first physical location in an environment in which the user is located; identifying one or more acoustically reflective surfaces at a second physical location in the environment; and simulating either a first direct sound from the first virtual audio source or a first primary reflected sound from the first virtual audio source off of a first reflective surface of the one or more reflective surfaces within the environment, wherein the simulated first direct sound or the simulated first primary reflected sound from the first virtual sound source includes a first directional characteristic.
In one aspect, the first directional characteristic is frequency dependent.
In one aspect, the step of simulating the first direct sound from the first virtual sound source or simulating the first primary reflected sound off of the first reflective surface of the one or more reflective surfaces further includes: generating a first left Head Related Transfer Function (HRTF) and a first right HRTF, arranged to simulate the first direct sound to the left ear of the user and right ear of the user, respectively, or to simulate the first primary reflected sound to the left ear of the user and the right ear of the user, respectively.
In one aspect, the method further includes simulating a first secondary reflected sound off of a second reflective surface of the one or more reflective surfaces.
In one aspect, the step of simulating the first secondary reflected sound off of the second reflective surface of the one or more reflective surfaces includes: generating a second left Head Related Transfer Function (HRTF) and a second right HTRF, arranged to simulate the first secondary reflected sound to the left ear of the user and right ear of the user, respectively.
In one aspect, the method further includes: associating a second virtual sound source with a third physical location in the environment; and simulating either a second direct sound from the second virtual audio source or a second primary reflected sound from the second virtual audio source off of the first reflective surface of the one or more reflective surfaces within the environment, wherein the simulated second direct sound or the simulated second primary reflected sound from the second virtual sound source includes a second directional characteristic.
In one aspect, the step of simulating the second direct sound from the second virtual sound source or simulating the second primary reflected sound off of the first reflective surface of the one or more reflective surfaces further includes: generating a third left Head Related Transfer Function (HRTF) and a third right HTRF, arranged to simulate the second direct sound to the left ear of the user and right ear of the user, respectively, or to simulate the second primary reflected sound to the left ear of the user and the right ear of the user, respectively.
In one aspect, the method further includes simulating a second secondary reflected sound off of the second reflective surface of the one or more reflective surfaces.
In one aspect, the step of simulating the second secondary reflected sound off of the second reflective surface of the one or more reflective surfaces includes: generating a fourth left Head Related Transfer Function (HRTF) and a fourth right HTRF, arranged to simulate the second secondary reflected sound to the left ear of the user and right ear of the user, respectively.
In one example, a binaural sound virtualization system is provided, the binaural sound virtualization system including a memory and a processor coupled to the memory, the processor configured to: receive an audio signal, receive location information about a virtual sound source, receive orientation information about the virtual sound source, process the audio signal into a left signal and a right signal, each of the left signal and the right signal configured to cause a user to perceive the audio signal as virtually coming from the virtual sound source located and oriented in accord with the location information and the orientation information, upon acoustically rendering the left signal to the user's left ear and the right signal to the user's right ear, and an output coupled to the processor and configured to provide the left signal and the right signal to an audio rendering device.
In one aspect, the processing of the audio signal causes a user to perceive the audio signal as virtually coming from the virtual sound source located and oriented in accord with the location information and the orientation information includes applying a radiation pattern associated with the orientation information.
In one aspect, the radiation pattern associated with the orientation information is reflected off one or more acoustically reflective surfaces, wherein the one or more acoustically reflective surfaces are selected from: a wall, a floor, or a ceiling within the environment.
In one aspect, the binaural sound virtualization system further includes a display configured to display an avatar representing the virtual sound source, wherein the display is arranged on a smartphone or other mobile computing device.
In one aspect, the binaural sound virtualization system further includes a motion tracker configured to collect data related to an orientation of the user.
In one example, a binaural sound virtualization system is provided, the binaural sound virtualization system including: an input to receive an audio signal; a first output to provide a first output signal to be acoustically rendered to a user's left ear; a second output to provide a second output signal to be acoustically rendered to a user's right ear; and a processor coupled to the input, the first output, and the second output, the processor configured to receive the audio signal and adjust the audio signal to generate each of the first output signal and the second output signal to virtualize the audio signal to be perceived as coming from a virtual sound source, the processor further configured to account for a radiation pattern of the virtual sound source in adjusting the audio signal to generate each of the first output signal and the second output signal.
These and other aspects of the various embodiments will be apparent from and elucidated with reference to the aspect(s) described hereinafter.
The present disclosure describes various systems and methods for sound source virtualization to cause a perceived source location of sound to be from a location external to a set of speakers proximate a user's ears. By controlling audio signals delivered to each of a user's left and right ears, such as by headphones, a source location of the sound may be virtualized to be perceived to come from elsewhere within an acoustic space, such as a room, vehicle cabin, etc. When listening to audio over headphones, particularly stereo (binaural) headphones, many listeners perceive the sound as coming from “inside their head”. Headphone externalization refers to the process of making sounds that are rendered over headphones sound as though they are coming from the surrounding environment, i.e. the sounds are “external” to the listener. The combination of head tracking and the use of head related transfer functions (HRTFs) can be used to give the listener some cues that help them perceive the sound as though it were coming from “outside their head”. The present disclosure appreciably recognizes that additional cues may be added that are consistent with the listener's surroundings to significantly improve the perceived externalization. More specifically, reflections off of acoustically reflective surfaces such as the walls, ceiling, floor, and other objects in the room may be synthetically generated and appropriately filtered with the corresponding HRTF that is consistent with respect to the reflection's direction of arrival. Similarly, a virtual direct acoustic signal may be generated from a source audio signal by filtering with an HRTF corresponding to the direction of arrival of the direct signal. In various examples, each of the direct and reflected signals may be adjusted to account for a radiation pattern associated with a virtual sound source, e.g., accounting for a virtual orientation of the virtual sound source.
To this end, the location and orientation of the listener in the room or environment, the location and orientation of the virtual sound source, and the location of any acoustically reflective surfaces (typically walls, ceiling, floor, etc.) is determined. This information can be ascertained, for example, via active scanning or monitoring by one or more sensors, cameras, or other means. In some examples, this information may be obtained by manually measuring an area and creating a corresponding digital map, or model, of the environment. By dynamically and continually updating sensor information about the environment, a system can be created that virtualizes sound no matter where the listener is and even as the listener moves around the environment.
In various examples, the concepts disclosed herein may be extended to multiple virtual sound sources, e.g., to a virtual stereo pair of speakers or a virtual multi-channel sound system, such as a surround sound system, as will be discussed below in detail.
The term “head related transfer function” or acronym “HRTF” is intended to be used broadly herein to reflect any manner of calculating, determining, or approximating head related transfer functions. For example, a head related transfer function as referred to herein may be generated or selected specific to each user, e.g., taking into account that user's unique physiology (e.g., size and shape of the head, ears, nasal cavity, oral cavity, etc.). Alternatively, a generalized head related transfer function may be generated or selected that is applied to all users, or a plurality of generalized head related transfer functions may be generated that are applied to subsets of users (e.g., based on certain physiological characteristics that are at least loosely indicative of that user's unique head related transfer function, such as age, gender, head size, ear size, or other parameters). In one embodiment, certain aspects of the head related transfer function may be accurately determined, while other aspects are roughly approximated (e.g., accurately determines the inter-aural delays, but coarsely determines the magnitude response). In various examples, a number of HRTF's may be stored, e.g., in a memory, and selected for use relative to a determined angle of arrival of a virtual acoustic signal.
The term “headphone” as used herein is intended to mean any sound producing device that is configured to provide acoustic energy to each of a user's left and right ears, and to provide some isolation or control over what arrives at each ear without being heard at the opposing ear. Such devices often fit around, on, in, or proximate to a user's ears in order to radiate acoustic energy into the user's ear canal. Headphones may be referred to as earphones, earpieces, earbuds, or ear cups, and can be wired or wireless. Headphones may be integrated into another wearable device, such as a headset, helmet, hat, hood, smart glasses or clothing, etc. The term “headphone” as used herein is also intended to include other form factors capable of providing binaural acoustic energy, such as headrest speakers in an automobile or other vehicle. Further examples include neck-worn devices, eyewear, or other structures, such as may hook around the ear or otherwise configured to be positioned proximate a user's ears. Accordingly, various examples may include open-ear forms as well as over-ear or around-ear forms. A headphone may include an acoustic driver to transduce audio signals to acoustic energy. The acoustic driver may be housed in an ear cup or earbud, or may be open-ear, or may be associated with other structures as described, such as a headrest. A headphone may be a single stand-alone unit or one of a pair of headphones, such as one headphone for each ear.
The term “augmented reality” or acronym “AR” as used herein is intended to include systems in which a user may encounter, with one or more of their senses (e.g., using their sense of sound, sight, touch, etc.), elements from the physical, real-world environment around the user that have been combined, overlaid, or otherwise augmented with one or more computer-generated elements that are perceivable to the user using the same or different sensory modalities (e.g., sound, sight, haptic feedback, etc.). The term “virtual” as used herein refers to this type of computer-generated augmentation that is produced by the systems and methods disclosed herein. In this way, a “virtual sound source” as referred to herein corresponds to a physical location in the real-world environment surrounding a user which is treated as a location from which sound radiates, but at which no sound is actually produced by an object. In other words, the systems and methods disclosed herein may simulate a virtual sound source as if it were a real object producing a sound at the corresponding location in the real world. In contrast, the term “real”, such as “real object”, refers to things, e.g., objects, which actually exist as physical manifestations in the real-world area or environment surrounding the user.
It is to be appreciated that the location associated with the virtual sound source 14 may be an empty space in the real-world environment, or occupied by a real object. In either event, the system 10 simulates a sound signal that is perceived by the user as originating from the designated location without that sound being actually produced by the environment or an object at that location. If a visual augmentation device is used as part of the system 10 (such as via a head mounted display or the display of a smartphone or other mobile computing device, as discussed in more detail with respect to
The system 10 also determines a location of the headphone assembly 12, the user wearing the headphone assembly 12, and one or more real objects in the environment surrounding the user, such as a wall 16a or 16b in
In one embodiment, the location of the user is determined as the location of the headphone assembly 12, or vice versa, since the user is presumably wearing the headphone assembly 12. Accordingly it should be appreciated that “location of the user” as used herein can be determined by directly identifying the location of the user, or indirectly by determining the location of some other device expected to be held, carried, or worn by the user, such as the headphone assembly 12, or a smartphone or other mobile device associated with the user.
It is to be appreciated that the locations of the user, the headphone assembly 12, the virtual sound source 14, and/or the object 16 may be determined as a global or absolute position relative to some global coordinate system, and/or as relative positions based on distances between and orientations of these entities with respect to each other. In one embodiment, relative distances and/or orientations between the entities are calculated from a global coordinate system by calculating the difference between each location. As discussed above with respect to
Regardless of how the various locations are determined, the locations can be used to determine a distance X1 corresponding to a direct sound path 18 from the virtual sound source 14 (namely, the physical location associated with the virtual sound source 14) to the headphone assembly 12. The system 10 also uses the determined locations to further determine one or more reflected sound paths, generally referred to with the reference numeral 20. Namely,
As the reflected sound paths 20 represent reflected sound, each reflected sound path 20 is simulated to include a reflection from the corresponding reflective object 16. For example, the reflection may be indicated at or by a reflection point, generally referred to with the reference numeral 22, on an acoustically reflective object in the environment. It is noted that sound from real sound sources reflects off the surface of objects, not just a single point, as illustrated and described with respect to the reflection points 22. However, since the sound produced by the virtual sound source 14 is simulated, the reflections of the sound off of the objects 16 are also simulated. For this reason, the reflection points 22 may be utilized if convenient, e.g., to simplify calculations and/or representations related to the reflected sound paths 20 or other characteristics of the reflected sound. Thus, with respect to
It is to be appreciated that the reflected sound paths 20a and 20b in
Each reflected sound path 20a and 20b includes a first segment having a distance X2 and a second segment having a distance X3 (thus, the sum of distances X2a and X3a defining a total length of the first sound path 20a and the sum of the distances X2b and X3b defining a total length of the second sound path 20b). It is to be appreciated that each of the reflected sound paths 20 can be analogized as a copy (generally referred to herein with the reference numeral 24) of the virtual sound source 14 mirrored (reflected) with respect to the object 16 causing that reflection. For example, in
Although the description above discusses simulation of omni-directional sound produced by a virtual sound source 14 within an environment, it should be appreciated that, as discussed below with respect to
One embodiment for the system 10 is shown in more detail in
The sound signal processed by the signal processing circuit 28 may be generated by a controller 30. The controller 30 includes a processor 32, a memory 34, and/or a communication module 36. The processor 32 may take any suitable form, such as a microcontroller, plural microcontrollers, circuitry, a single processor, or plural processors configured to execute software instructions. The memory 34 may take any suitable form or forms, including a volatile memory, such as random access memory (RAM), or non-volatile memory such as read only memory (ROM), flash memory, a hard disk drive (HDD), a solid state drive (SSD), or other data storage media. The memory 34 may be used by the processor 32 for the temporary storage of data during its operation. Data and software, such as the algorithms or software necessary to analyze the data collected by the sensors of the system 10, an operating system, firmware, or other application, may be installed in the memory 34. The communication module 36 is arranged to enable wired or wireless signal communication between the controller 30 and each of the other components of the system 10, particularly if the components of the system 10 are implemented as one or more remote devices separate from the headphone assembly 12. The communication module 36 may be or include any module, device, or means capable of transmitting a wired or wireless signal, such as but not limited to Wi-Fi (e.g., IEEE 802.11), Bluetooth, cellular, optical, magnetic, Ethernet, fiber optic, or other technologies.
The controller 30 is configured to perform or assist in performing any of the determinations, selections, or calculations discussed herein. For example, with respect to
To collect position data usable by the controller 30 to calculate the aforementioned locations, orientations, and/or distances, the system 10 may include a localizer 38. The localizer 38 includes any sensor, device, component, or technology capable of obtaining, collecting, or generating position data with respect to the location of the user, the headphone assembly 12, and the object 16, or the relative positions of these entities with respect to each other. The localizer 38 may include a rangefinder, proximity sensor, depth sensor, imaging sensor, camera, or other device. The localizer 38 may be embedded in the headphone assembly 12, or included by a remote device separate from the headphone assembly 12, such as incorporated in a mobile device, a television, an audio/video system, or other devices. For example, the localizer 38 may detect the reflective objects 16 by way of a transmitter and receiver configured to generate a signal and measure a response reflected off nearby objects, e.g., ultrasonic, infrared, or other signal. In one embodiment, the localizer 38 includes a camera, and the controller 30 includes an artificial neural network, deep learning engine, or other machine learning algorithm trained to detect the object 16 from the image data captured by the camera. In one embodiment, the localizer 38 includes a global positioning system (GPS) antenna or transponder, e.g., embedded in the headphone assembly 12 or otherwise carried by the user (such as in a smartphone). In one embodiment, the controller 30 only selects objects to be reflective objects 16 if they are within some threshold distance of the virtual sound source 14.
The system 10 may additionally include a motion tracker 40 that is configured to track motion of the user, particularly, the orientation of the user. In other words, since the HRTFs characterizing the sound received by the user at each ear is at least partially defined by the orientation of the user's ears with respect to the direction of arrival of sound, the motion tracker 40 may be used to track the location and direction in which the user's head (and/or the headphone assembly 12) is facing in order to better approximate the HRTFs for that user (and/or the speakers 26). In various examples, motion of the user may be directly tracked by various sensors, while in other examples motion of the user may be indirectly tracked by monitoring motion of the headphone assembly 12, or some other device held, carried, or worn by the user.
The motion tracker 40 may include sensors embedded into or integrated with the headphone assembly 12 to track motion of the headphone assembly 12, such as a proximity sensor utilizing ultrasonic, infrared, or other technologies, cameras, accelerometers, gyroscopes, etc. In one embodiment, the motion tracker 40 includes a nine-axis inertial motion sensor. In one embodiment, the motion tracker 40 includes at least one sensor external to the headphone assembly 12. For example, the motion tracker 40 may include a depth sensor, imaging sensor, and/or camera system for tracking one or more elements of the user and/or the headphone assembly 12. Such sensors and systems may be included in a remote device, such as a mobile device, a television, an audio/video system, or other systems. In one embodiment, the motion tracker 40 includes a tag or marker (embedded in the headphone assembly 12 and/or otherwise carried or worn by the user) that is tracked by one or more external cameras, e.g., as is commonly used to perform motion capture, or mo-cap, in the film and videogame industries.
It is to be appreciated that both the localizer 38 and the motion tracker 40 are arranged to track, monitor, or detect the relative positions of the user, the headphone assembly 12, the virtual sound source 14, and/or the object 16. In other words, the data or information collected and/or generated by localizer 38 and the motion tracker 40 can be used, e.g., by the controller 30, to determine whether relative motion has occurred between any of these entities. Accordingly, positions of each of these entities can change and the system 10 is capable of reacting accordingly and in essentially real-time. For example, as the user walks about an environment, the localizer 38 can continuously recalculate the distance between the user and the object 16, while the motion tracker 40 monitors the relative orientation of the user's head (and/or the headphone assembly 12). As another example, the controller 30 can change the location of the virtual sound source 14 at will, and the data collected by the localizer 38 and/or the motion tracker 40 used to generate direct sound paths, reflected sound paths, and HRTFs from the new location of the virtual sound source 14. In some examples, the localizer 38 and the motion tracker 40 may be the same component.
Systems that may be useful in some embodiments for creating the localizer 38 and/or the motion tracker 40 include the system marketed by Microsoft under the name HoloLens, or the systems marketed by Google as ARCore, or the systems marketed by Apple as ARKit. Each of these systems may utilize a combination of one or more cameras in order to detect objects, such as people, walls, etc. In particular, some such systems include a visual camera coupled with an infrared camera to determine depth or distance, which may be utilized in both rangefinding and motion tracking applications. Those of ordinary skill in the art will readily recognize other systems that may be utilized.
The system 10 may include an acoustic characteristic detector 42 for collecting or generating data relevant to at least one acoustic parameter or characteristic of the object 16 or the environment in which the user and the object 16 are located. In one embodiment, the detector 42 is arranged with or as a sensor to collect data related to the reverberation time and/or acoustic decay characteristics of the environment in which the user is located. For example, the detector 42 may produce (e.g., “ping”) a specified sound signal (e.g., outside of the range of human hearing if desired) and measure the reflected response (e.g., with a microphone). In one embodiment, an absorption coefficient is calculated from the reverberation time or other characteristics of the environment as whole, and applied to the object 16 as an approximation. If the sound signal is specifically directed or aimed at the object 16, then the differences between the original signal and the initially received reflections can be used to calculate an absorption coefficient of the object 16.
It is to be appreciated that the components of the system 10 as shown in
While methods of operating the system 10 can be appreciated in view of the above description,
At step 56, any number of acoustically reflective real objects (e.g., the object 16) in the environment surrounding the user are identified or detected. For example, step 56 may be achieved via use of the localizer 38 scanning or probing the environment with one or more of its sensors. The controller 30 may be configured with algorithms or functions that result in the controller 30 not selecting any object that is less than a threshold size, e.g., in detectable surface area or in one or more detected dimensions. As another example, the localizer 38 may include a camera and the controller 30 may include an artificial neural network or other deep learning mechanism that is trained with image-based object recognition capabilities, such that the controller 30 is configured to select only objects that it recognizes. In one embodiment, step 56 includes the localizer 38 downloading or generating a map or other data representative of the environment in which the user is located. For example, the localizer 38 may be configured to retrieve GPS or map data from an internet or other database. As one specific example, the Maps product by Google identifies three dimensional data for various objects, such as buildings, trees, etc. which may be retrieved by the localizer 38 and utilized to set the boundaries used to identify or define the objects 16.
At step 58, paths for reflected sound (e.g., the reflected sound paths 20) and points on the acoustically reflective real objects from which the sound is reflected (e.g., the reflection points 22) are determined (e.g., by the controller 30 utilizing the position data collected by the localizer 38). Step 58 may include creating copies of the virtual sound source that are mirrored with respect to the acoustically reflective objects (e.g., the mirrored copies 24). At step 60, the distance of the reflected sound path, and/or one or more segments comprising the reflected sound path are determined. For example, the reflected sound path may include a distance between the user (or other proxy) and the mirrored copies generated in step 58. The reflected sound path may additionally or alternatively include multiple segments such as a first segment from the virtual sound source to the reflection point (e.g., the distance X2) and/or a second segment from the reflection point to the user (e.g., the distance X3).
At step 62, HRTFs are generated or selected (e.g., via the controller 30) for sound to be received at each of the user's ears (e.g., via the speakers 26L and/or 26R) and each direct or reflected path to simulate sound originating from the virtual sound source 14, and as reflected from the acoustically reflective object at each of the reflection points. Step 62 may include analyzing data collected by one or more sensors of a motion tracker (e.g., the motion tracker 40) in order to determine an orientation of the user. An orientation of the virtual sound source 14 can be virtually set by the controller 30 to be utilized in calculating a directional impact of the radiation pattern of the virtual sound source.
At step 64, one or more “direct” output signals are generated (e.g., by the signal processing circuit 28) to represent the sound directly coming to the user from the virtual sound source. The output signals are generated by processing the desired sound signal (representing the sound being virtually emitted by the virtual sound source, e.g., as generated by the controller 30) according to the HRTFs generated in step 62 and any other desired signal processing. The number of output signals generated in step 62 can be equal to the number of speakers, i.e., with one output signal for each of the speakers 26, with each of the output signals processed by a different one of the HRTFs that corresponds to the intended speaker.
At step 66, one or more “reflected” output signals are generated similarly to step 64 but representing the sound coming to the user as reflected at each of the reflection points. In addition to applying the HRTFs generated in step 62, step 66 may include applying an absorption coefficient (e.g., generated by the controller 30 using the data gathered by the detector 42) for the real objects reflecting the virtual sound, or generally for the environment in which the user and the acoustically reflective objects are located. Additionally, since the reflected path of sound is expected to be longer than the direct sound path, step 66 may include delaying the reflected output signals by an amount of time equal to a difference between the length of the reflected path and the length of the direct path divided by the speed of sound. Typically, the human brain interprets any reflected sounds received by the ears within approximately 40 ms of the original to be essentially included as part of the original sound. In this way, step 66 may include not outputting the reflected output signal if the calculated delay is greater than about 40 ms. However, in other embodiments, output signals having greater than a 40 ms delay are specifically included in order to induce an echo effect, which may be particularly useful in simulating the sound characteristics for large open rooms, such as arenas or theatres, and thus advantageous for improving sound externalization in this type of environment.
At step 68 one or more speakers (e.g., the speakers 26) produce sound in accordance with the output signals generated in steps 64 and 66. The output signals intended for each speaker can be summed (e.g., by the signal processing circuit 28) before being sent to the respective ones of the speakers 26. As discussed above, due to the application of the HRTFs, particularity with respect to the HRTFs from the reflection points (e.g., the reflection points 22) on the acoustically reflective objects (e.g., the walls 16a and 16b, or other instances of the object 16), the externalization of the sound received by the user is significantly improved. By use of the HRTFs from the reflection points, the sound produced by the headphone assembly 12 includes acoustic characteristics specific to the actual environment in which the user is located, which the user's brain “expects” to hear if sound were created by a real object at the location of the virtual sound source 14. Advantageously, the simulation and synthetic or artificial insertion of the first early reflections of the virtual sound source from real objects in the environment (e.g., via the reflection paths 20) helps to convince the user's brain that the sound is occurring “external” to the user, at the location associated with the virtual sound source 14.
Step 68 proceeds to step 70 at which relative movement between the user, the virtual sound source, and/or the acoustically reflective objects is tracked. In response to any such relative movement, the method 50 can be returned to any of previous steps 52, 54, 56, or 58 in order to recalculate any of the corresponding locations, orientations, distances, HRTFs, or output signals. Additionally, each of the steps previous to step 68 can immediately proceed to step 70. In other words, the system 10 can be arranged in accordance with the method 50 to be constantly, e.g., in real-time, monitoring the real-world environment about the user, the acoustically reflective objects in the environment, and/or the user's location in the real world environment in order to update the sound produced by the speakers. In this way, the system 10 is able to dynamically change the output signal as the user moves (e.g., as determined from the data collected by the localizer 38), the user rotates their head (e.g., as determined from the data collected by the motion tracker 40), the virtual sound source 14 is moved (e.g., by the controller 30 setting a new location for the virtual sound source 14), the object 16 moves (e.g., a nearby car forming the object 16 drives away or a window or door in a wall forming the object 16 is opened), etc.
It is to be appreciated that other sensory modalities can be augmented in addition to the acoustic augmentation achieved in accordance with the above-described embodiments. One example includes visual augmented reality elements, although touch (haptic feedback), smell, taste, or other senses may be augmented if desired. To this end,
In
In the embodiment of
In addition to the acoustically reflective objects in the environment, the display 74 also shows a virtual avatar 76a to visually represent the virtual sound source 14. The virtual avatar 76a in this example takes the form of a loudspeaker, but any other imaginable shape or three-dimensional virtual construct may be utilized. The virtual avatar 76a can be created by the system 10 (e.g., via the controller 30′) to represent the virtual sound source 14 and create a visual cue to the user regarding the location of the virtual sound source 14. For example, if the loudspeaker avatar of
It is to be appreciated that virtual avatars can take any shape or form. For example,
As a result of the calculated HRTFs, the user would perceive the sound produced by the speakers 26 of the headphone assembly 12 as if it were coming from the indicated location of the virtual avatar. As the user moves about the room, orients their head to examine the virtual avatar, or as the controller moves the location of the virtual avatar and/or animates the virtual avatar, e.g., to move it (and the corresponding location virtual sound source) about the room, the system 10 can be configured to react, e.g., in real-time, as discussed above with respect to step 70 of the method 50, to recalculate the HRTFs so that the sound is continually perceived as coming from the particular location indicated by the virtual avatar and associated with the virtual sound source 14. It is also to be appreciated that a virtual avatar does not need to be utilized. For example, the virtual sound source 14 may be set to any location in the environment, such as an empty spot on the floor 16e, or to correspond to the location of a physical object, such as the pedestal 16f itself (without the avatar 76a). Those of ordinary skill in the art will recognize additional virtual elements and other sensory augmentations that can be utilized with the system 10. It should be appreciated that various examples may not include any visual display components and/or may not be augmented in modalities other than audio.
As illustrated in
As illustrated in
Additionally, as illustrated in
Furthermore, as illustrated in
Although illustrated and described above using only one direct sound path, one primary reflected sound path, and one secondary reflected sound path, it should be appreciated that sound externalization system 210 may be arranged to simulate, using respective left and right HRTFs, any number of direct, primary reflected, secondary reflected, tertiary, or n-ary reflected sound paths at headphone assembly 212. For example, with respect to primary reflected sound paths, sound externalization system 210 may utilize at least one primary reflected sound path from each wall, ceiling, or floor within the environment, resulting in at least six primary reflected sound paths for each virtual sound source within the environment, e.g., if the environment is a rectangular room, for instance. Additionally in such a room, for each primary reflected sound path, there could be five additional secondary reflected sound paths, i.e., where each primary reflected sound is further reflected off of the remaining five walls in a second order reflection, such that there may be thirty secondary reflected sound paths in a typical room. This means that for each virtual sound source within the environment, there could be at least thirty-seven simulated sound paths (one direct sound path, six primary reflected sound paths, and thirty secondary reflected sound paths), each being simulated using respective left and right HRTFs. In various examples, additional reflected sound paths may account for reflections off any number of walls and/or objects and may include any number of multiply reflected sound paths, e.g., 3rd order, 4th order, etc.
In a physical system, each speaker may generate sound waves with a directional component, i.e., where an acoustic radiation pattern is not angularly uniform around the speaker, e.g., 360 degrees around the speaker in a primary plane of the speaker (in two dimensions) and likewise not spherically uniform around the entire 4π steradians of solid angle about the speaker (in three dimensions). Thus, it is desirable that to simulate sound generated in each of the simulated sound paths described above in a virtual sound system, where each virtual sound source can produce sound with a directional characteristic or component. Additionally, physical speaker systems typically generate acoustic radiation patterns that are dependent on frequency. As illustrated in
The terms “directional characteristic” or “directional characteristics” as used herein are intended to mean a portion of the acoustic sound radiation pattern that is shaped or directed in a particular direction, i.e., in the direction of a user or other target with respect to the 360 degree (in a primary plane) or the fully three-dimensional acoustic radiation patterns produced by a virtual speaker. For example, in each acoustic radiation pattern shown in
As illustrated in
As illustrated in
All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.
The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”
The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified.
As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of” “only one of,” or “exactly one of.”
As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified.
It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.
In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively.
The above-described examples of the described subject matter can be implemented in any of numerous ways. For example, some aspects may be implemented using hardware, software or a combination thereof. When any aspect is implemented at least in part in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single device or computer or distributed among multiple devices/computers.
The present disclosure may be implemented as a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some examples, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to examples of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The computer readable program instructions may be provided to a processor of a, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various examples of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Other implementations are within the scope of the following claims and other claims to which the applicant may be entitled.
While various examples have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the examples described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific examples described herein. It is, therefore, to be understood that the foregoing examples are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, examples may be practiced otherwise than as specifically described and claimed. Examples of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present disclosure.
The present application is a Continuation-in-Part of U.S. patent application Ser. No. 15/945,449 filed on Apr. 4, 2018, which application is herein incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6446002 | Barton | Sep 2002 | B1 |
7305097 | Rosen et al. | Dec 2007 | B2 |
7630500 | Beckman et al. | Dec 2009 | B1 |
8325936 | Eichfeld et al. | Dec 2012 | B2 |
9066191 | Strauss et al. | Jun 2015 | B2 |
9075127 | Hess et al. | Jul 2015 | B2 |
9215545 | Dublin et al. | Dec 2015 | B2 |
9352701 | Vautin et al. | May 2016 | B2 |
9445197 | Oswald et al. | Sep 2016 | B2 |
9674630 | Mateos Sole et al. | Jun 2017 | B2 |
9706327 | Brannmark et al. | Jul 2017 | B2 |
9743187 | Bender | Aug 2017 | B2 |
9913065 | Vautin et al. | Mar 2018 | B2 |
9955261 | Ojala et al. | Apr 2018 | B2 |
10056068 | Oswald et al. | Aug 2018 | B2 |
10123145 | Vautin et al. | Nov 2018 | B2 |
10694313 | Zilberman et al. | Jun 2020 | B2 |
10721521 | Robinson et al. | Jul 2020 | B1 |
10812926 | Asada et al. | Oct 2020 | B2 |
20080101589 | Horowitz et al. | May 2008 | A1 |
20080170730 | Azizi et al. | Jul 2008 | A1 |
20080273708 | Sandgren et al. | Nov 2008 | A1 |
20080273722 | Aylward et al. | Nov 2008 | A1 |
20080273724 | Hartung et al. | Nov 2008 | A1 |
20080304677 | Abolfathi et al. | Dec 2008 | A1 |
20090046864 | Mahabub | Feb 2009 | A1 |
20090214045 | Fukui et al. | Aug 2009 | A1 |
20100226499 | De Bruijn et al. | Sep 2010 | A1 |
20120008806 | Hess | Jan 2012 | A1 |
20120070005 | Inou et al. | Mar 2012 | A1 |
20120093320 | Flaks et al. | Apr 2012 | A1 |
20120140945 | Harris | Jun 2012 | A1 |
20130041648 | Osman | Feb 2013 | A1 |
20130121515 | Hooley et al. | May 2013 | A1 |
20130194164 | Sugden et al. | Aug 2013 | A1 |
20140153751 | Wells | Jun 2014 | A1 |
20140198918 | Li et al. | Jul 2014 | A1 |
20140314256 | Fincham et al. | Oct 2014 | A1 |
20140334637 | Oswald et al. | Nov 2014 | A1 |
20150092965 | Umminger | Apr 2015 | A1 |
20150119130 | Lovitt | Apr 2015 | A1 |
20150208166 | Raghuvanshi et al. | Jul 2015 | A1 |
20160100250 | Baskin et al. | Apr 2016 | A1 |
20160286316 | Bleacher et al. | Sep 2016 | A1 |
20160360224 | Laroche et al. | Dec 2016 | A1 |
20160360334 | Wu et al. | Dec 2016 | A1 |
20170078820 | Brandenburg et al. | Mar 2017 | A1 |
20170085990 | Sladeczek et al. | Mar 2017 | A1 |
20180020312 | Visser et al. | Jan 2018 | A1 |
20180077514 | Lee et al. | Mar 2018 | A1 |
20180091922 | Satongar | Mar 2018 | A1 |
20180124513 | Kim et al. | May 2018 | A1 |
20180146290 | Christoph et al. | May 2018 | A1 |
20180232471 | Schissler et al. | Aug 2018 | A1 |
20190037334 | Acharya | Jan 2019 | A1 |
20190104363 | Vautin et al. | Apr 2019 | A1 |
20190187244 | Riccardi et al. | Jun 2019 | A1 |
20190313201 | Torres et al. | Oct 2019 | A1 |
20190357000 | Karkkainen et al. | Nov 2019 | A1 |
20200037097 | Torres et al. | Jan 2020 | A1 |
20200107147 | Munoz et al. | Apr 2020 | A1 |
20200275207 | Zilberman et al. | Aug 2020 | A1 |
20200413213 | Konagai | Dec 2020 | A1 |
20210400417 | Freeman et al. | Dec 2021 | A1 |
Number | Date | Country |
---|---|---|
2018127901 | Jul 2018 | WO |
2020035335 | Feb 2020 | WO |
Entry |
---|
“Omnidirectional.”, Sweetwater, Nov. 3, 1997, www.sweetwater.com/insync/omnidirectional/#:˜:text=Speakers%20are%20omnidirectional%20if%20they,beam%E2%80%9D%20or%20be%20very%20directional. (Year: 1997). |
International Search Report and the Written Opinion of the International Searching Authority, International Application No. PCT/US2019/024618, pp. 1-15, dated Jun. 19, 2019. |
Jot et al.; Augmented Reality Headphone Environment Rendering; Audio Engineering Society Conference Paper; 2016; pp. 1-6. |
Vera Erbes et al: “Extending the closed-form image source model for source directivity (Digital appendix)”, Proceedings DAGA 2018 munich, Mar. 22, 2018 (Mar. 22, 2018), pp. 1298-1301, XP055685529, doi: 10.14279/depositonce-6724. |
International Search Report and the Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2020/053068, pp. 1-15, dated Dec. 4, 2020. |
International Preliminary Reporton Patentability, International Patent Application No. PCT/US2020/053068, pp. 1-11, dated Apr. 5, 2022. |
The International Search Report and the Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2021/072012, pp. 1-14, dated Feb. 11, 2022. |
The International Search Report and the Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2021/072072, pp. 1-13, dated Mar. 10, 2022. |
The International Search Report and the Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2021/070709, pp. 1-11, dated Oct. 5, 2021. |
The International Search Report and the Written Opinion of the International Searching Authority, International Patent Application No. PCT/US2021/071464, pp. 1-19, dated Feb. 24, 2022. |
Number | Date | Country | |
---|---|---|---|
20200037097 A1 | Jan 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15945449 | Apr 2018 | US |
Child | 16592454 | US |