A user of a virtual reality media player device (e.g., a virtual reality headset, a mobile device, a game console, a computer, etc.) may experience virtual reality worlds by way of an immersive rendering, by the media player device, of video the user would see and audio the user would hear if the user were actually present in the virtual reality world. In some examples, such virtual reality worlds may be completely computer-generated (e.g., imaginary worlds, virtualized worlds inspired by real-world places, etc.). In other examples, certain virtual reality worlds experienced by a user may be generated based on camera-captured video of a real-world scene, microphone-captured audio from the real-world scene, and so forth.
To maximize the enjoyment of the user experiencing a particular virtual reality world, it may be desirable for the user to have freedom to move through a virtual reality space within the virtual reality world (e.g., to move to any place the user wishes within the virtual reality space). Providing camera-captured video data and microphone-captured audio data for every location within a virtual reality space based on a real-world scene may present a challenge, however, because cameras and microphones cannot practically be placed at every location with a capture zone of a real-world scene. Currently, audio data provided in connection with such a virtual environment fails to provide some of the immersive qualities of the video data. For example, audio data may not be customized to specific locations within a virtual reality space or may represent sound that does not indicate a direction from which the sound originates to the user. Such deficiencies in the audio data may detract from the immersiveness of the virtual reality world experienced by the user.
The accompanying drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements.
Systems and methods for simulating microphone capture within a capture zone of a real-world scene are described herein. For example, as will be described in more detail below, certain implementations of a microphone capture simulation system may access a captured set of audio signals from a plurality of directional microphones disposed at a plurality of locations on a perimeter of a capture zone of a real-world scene. The captured set of audio signals may be captured by the plurality of directional microphones. In some examples, the microphone capture simulation system may access the captured set of audio signals directly (e.g., using a plurality of directional microphones integrated within the microphone capture simulation system), by receiving them from the respective directional microphones that capture the signals, by downloading or otherwise accessing them from a storage facility where the signals are stored, or in any other way as may serve a particular implementation.
The microphone capture simulation system may also identify a particular location within the capture zone. For instance, a user may be experiencing (e.g., using a media player device) a virtual reality space that is based on the capture zone of the real-world scene, and the identified location within the capture zone may correspond to a virtual location at which the user is virtually located within the virtual reality space. In some examples, the microphone capture simulation system may dynamically identify the particular location as the user is experiencing the virtual reality space and the location is continuously changing (e.g., as the user is moving around within the virtual reality space).
Based on the captured set of audio signals that has been accessed and the location that has been identified, the microphone capture simulation system may generate a simulated set of audio signals representative of a simulation of a full-sphere multi-capsule microphone capture at the location. For example, the full-sphere multi-capsule microphone capture represented by the simulated set of audio signals may simulate an A-format signal that would be captured by a multi-capsule microphone (e.g., a full-sphere multi-capsule microphone such as an Ambisonic microphone) if the multi-capsule microphone were located at the identified location.
The microphone capture simulation system may process the simulated set of audio signals to form a renderable set of audio signals. The renderable set of audio signals may be configured to be rendered (e.g., by a media player device used by the user) to simulate full-sphere sound for the virtual location while the user is virtually located at the virtual location within the virtual reality space. For example, the renderable set of audio signals may take the form of a B-format signal (e.g., a filtered and/or decoded B-format signal into which other sounds have optionally been added). When decoded and rendered (e.g., converted for a particular speaker configuration and played back or otherwise presented to a user by way of the particular speaker configuration), a B-format signal may be manipulated so as to replicate not only a sound that has been captured, but also a direction from which the sound originated. In other words, as will be described in more detail below, B-format signals may include sound and directionality information such that they may be rendered to provide full-sphere sound (e.g., three-dimensional (“3D”) surround sound) to a listener. In this case, a B-format signal formed by processing the simulated set of audio signals (e.g., the A-format signal) described above may be configured to be rendered as full-sphere sound customized to the virtual location of the user and indicative of respective 3D directions from which different sounds originate.
In the same or other exemplary implementations, a microphone capture simulation system may perform operations for simulating microphone capture within a capture zone of a real-world scene in real time to dynamically and continuously update the microphone capture simulation as a user moves from one point to another within the virtual reality space. As used herein, operations are performed “in real time” when performed immediately and without undue delay. Thus, because operations cannot be performed instantaneously, it will be understood that a certain amount of delay (e.g., from a few milliseconds up to a few seconds) will necessarily accompany any real-time operation. However, if operations are performed immediately such that, for example, an updated microphone capture simulation for a particular location to which a user has moved is provided to the user before the user moves to yet another location (albeit up to a few seconds delayed), such operations will be considered to be performed in real time.
In certain real-time implementations, for example, a microphone capture simulation system may access, in real time from a plurality of directional microphones disposed at a plurality of locations on a perimeter of a capture zone of a real-world scene, a captured set of audio signals captured in real time by the plurality of directional microphones. The microphone capture simulation system may identify, in real time, a first location within the capture zone. The first location may correspond to a first virtual location at which a user is virtually located within a virtual reality space (e.g., a virtual reality space based on the capture zone of the real-world scene) being experienced by the user at a first moment in time. In real time and based on the captured set of audio signals and the first location, the microphone capture simulation system may generate a simulated set of audio signals representative of a simulation of a full-sphere multi-capsule microphone capture at the first location and at the first moment in time.
At a second moment in time subsequent to the first moment in time, the microphone capture simulation system may, in real time, identify a second location within the capture zone. For instance, the second location may correspond to a second virtual location at which the user is virtually located within the virtual reality space at the second moment in time. Based on the captured set of audio signals and the second location, the microphone capture simulation system may update, in real time, the simulated set of audio signals to be representative of a simulation of a full-sphere multi-capsule microphone capture at the second location and at the second moment in time.
As such, the microphone capture simulation system may process, in real time, the simulated set of audio signals to form a renderable set of audio signals. For example, the renderable set of audio signals may be configured to be rendered (e.g., by a media player device used by the user) to simulate full-sphere sound for the first virtual location at the first moment in time and to simulate full-sphere sound for the second virtual location at the second moment in time. Accordingly, as the user moves from one virtual location to another within the virtual reality space (e.g., from the first virtual location to the second virtual location), the microphone capture simulation system may facilitate providing the user with continuously updated audio data representative of full sphere sound for every virtual location to which the user moves.
Methods and systems for simulating microphone capture within a capture zone of a real-world scene may provide various benefits to providers and users of virtual reality content. As described above, virtual reality technology may allow users to look around in any direction (e.g., up, down, left, right, forward, backward) and, in certain examples, to also move around freely to various parts of a virtual reality space. As such, when audio data (e.g., a renderable set of audio signals) generated in accordance with methods and systems described herein is rendered for a user, the audio data may enhance the realism and immersiveness of the virtual reality world as compared to audio data that is not customized to provide full-sphere sound from the user's current virtual location and/or that does not take directionality into account.
Additionally, methods and system described herein may make possible the benefits of full-sphere sound for virtual reality spaces based on real-world scenes (e.g., camera-captured and microphone-captured real-world scenes) without requiring actual multi-capsule microphones (e.g., full-sphere multi-capsule microphones) to be positioned at locations within the capture zone of the real-world scene. Because microphone capture simulations for multi-capsule microphones may be simulated based on captured signals from a plurality of directional microphones disposed on a perimeter of the capture zone, no microphone needs to be disposed within the capture zone at all in some examples. This may be particularly beneficial for capture zones in which it is not possible or convenient to place microphones (e.g., due to potential interference with events happening within the capture zones). For the same reason, there also may not be a need in certain examples for relatively complex multi-capsule microphones (e.g., full-sphere multi-capsule microphones) to be used to capture full-sphere sound for a capture zone. As a result, high quality, full-sphere sound may be provided for real-world-scene-based virtual reality spaces using microphone setups having simpler and fewer microphones disposed at more convenient locations than might be possible using conventional techniques.
Various embodiments will now be described in more detail with reference to the figures. The disclosed systems and methods may provide one or more of the benefits mentioned above and/or various additional and/or alternative benefits that will be made apparent herein.
Signal access facility 102 may include any hardware and/or software (e.g., including microphones, audio interfaces, network interfaces, computing devices, software running on or implementing any of these devices or interfaces, etc.) that may be configured to capture, receive, download, and/or otherwise access audio signals for processing by signal processing facility 104. For example, signal access facility 102 may access a captured set of audio signals captured by a plurality of directional microphones disposed at a plurality of locations on a perimeter of a capture zone of a real-world scene (e.g., cardioid microphones or the like whose directional polar pattern is pointed inward toward the capture zone, as will be illustrated below).
Signal access facility 102 may access the captured set of audio signals from the plurality of directional microphones in any suitable manner. For instance, in certain implementations, signal access facility 102 may include one or more directional microphones such that accessing the captured set of audio signals from these microphones may be performed by using these integrated directional microphones to directly capture the signals. In the same or other implementations, some or all of the audio signals accessed by signal access facility 102 may be captured by directional microphones that are external to system 100 and under the direction of signal access facility 102 or of another system. For instance, signal access facility may receive audio signals directly from directional microphones external to, but communicatively coupled with, system 100, and/or from another system, device, or storage facility that is coupled with the microphones and provides the audio signals to system 100 in real time or after the audio signals have been recorded, preprocessed, and/or stored. Regardless of how system 100 is configured with respect to the plurality of directional microphones and/or any other external equipment, systems, or storage used in the audio signal capture process, as used herein, system 100 may be said to access an audio signal from the plurality of directional microphones if system 100 has gained access to audio signals that the plurality of directional microphones captured.
Signal processing facility 104 may include one or more physical computing devices (e.g., the same hardware and/or software components included within signal access facility 102 and/or components separate from those of signal access facility 102) that perform various signal processing operations for simulating microphone capture within a capture zone of a real-world scene. For example, signal processing facility 104 may perform operations associated with identifying a location within the capture zone of the real-world scene, generating a simulated set of audio signals associated with the identified location, and/or processing the simulated set of audio signals to form a renderable set of audio signals for rendering by a media player device.
More specifically, signal processing facility 104 may be configured to identify (e.g., dynamically identify while a user is experiencing and moving around within a virtual reality space) a location within the capture zone that corresponds to a virtual location at which a user is virtually located within a virtual reality space being experienced by the user. For example, if the virtual reality space is based on the capture zone of the real-world scene, the identified location in the capture zone may be the location that corresponds to the current virtual location of the user in the virtual reality space. As such, signal processing facility 104 may include or have access to a communication interface by way of which the current virtual location of the user (e.g., which may be tracked by a media player device the user is using to experience the virtual reality space) may be received from the media player device being used by the user. In some examples, signal processing facility 104 may continuously receive updated information regarding the virtual location as the user experiences the virtual reality space and the media player device tracks the changing virtual location of the user within the virtual reality space.
Signal processing facility 104 may further be configured to generate a simulated set of audio signals representative of a simulation of the audio signals that a full-sphere multi-capsule microphone (e.g., an Ambisonic microphone such as a SOUNDFIELD microphone or another microphone capable of capturing 3D surround sound using multiple microphone capsules) would capture at the identified location. The simulated set of audio signals may be generated based on the captured set of audio signals and the identified location in any suitable way, as will be described in more detail below. Once the simulated set of audio signals is generated, signal processing facility 104 may also process the simulated set of audio signals in various ways that will also be described in more detail below. For example, signal processing facility 104 may process the simulated set of audio signals to form a renderable set of audio signals configured to be rendered (e.g., by the media player device used by the user) to simulate full-sphere sound for the virtual location while the user is virtually located at the virtual location within the virtual reality space.
As described previously, in certain examples, the operations performed by signal access facility 102 and signal processing facility 104 may each be performed in real time as the user is experiencing the virtual reality space to allow the user to continuously enjoy full-sphere surround sound customized to his or her current virtual location within the virtual reality space.
Storage facility 106 may include signal data 108 and/or any other data received, generated, managed, maintained, used, and/or transmitted by facilities 102 and 104. Signal data 108 may include data associated with the audio signals such as the captured set of audio signals accessed by signal access facility 102, the simulated set of audio signals generated by signal processing facility 104, the renderable set of audio signals formed based on the simulated set of audio signals, and/or any other signals (e.g., intermediary signals) or data used to implement methods and systems described herein as may serve a particular implementation.
To illustrate system 100 in operation,
As further illustrated by configuration 200, system 100 may be included within a virtual reality provider system 206 that is communicatively coupled with audio capture system 204 as well as with a network 208. Virtual reality provider system 206 (and system 100, as a subsystem thereof) may exchange and communicate data, by way of network 208, with a media player device 210 associated with a user 212.
Virtual reality provider system 206 may be responsible for capturing, accessing, generating, distributing, and/or otherwise providing and curating virtual reality media content for one or more media player devices such as media player device 210. As such, virtual reality provider system 206 may capture virtual reality data representative of image data (e.g., video) and audio data (e.g., a renderable set of audio signals simulating full-sphere sound for a particular virtual location), and may combine this data into a form that may be distributed and used by media player devices such as media player device 210 to provide virtual reality experiences for users such as user 212.
Virtual reality data may be distributed using any suitable communication technologies included in network 208, which may include a provider-specific wired or wireless network (e.g., a cable or satellite carrier network or a mobile telephone network), the Internet, a wide area network, a content delivery network, and/or any other suitable network or networks. Data may flow between virtual reality provider system 206 and one or more media player devices such as media player device 210 using any communication technologies, devices, media, and protocols as may serve a particular implementation.
As described above, system 100 may operate within a configuration such as configuration 200 to simulate microphone capture for arbitrary locations (e.g., locations where no physical microphone is disposed) within a capture zone of a real-world scene. To illustrate the relationship between these virtual locations and this capture zone of this real-world scene,
Capture zone 302 may be included (e.g., along with other capture zones adjacent to or separate from capture zone 302) within a real-world scene. As such, capture zone 302 may be associated with any real-world scenery, real-world location, real-world event (e.g., live event, etc.), or other subject existing in the real world (e.g., as opposed to existing only in a virtual world) and that may be captured by various type of capture devices (e.g., color video cameras, depth capture devices, microphones, etc.) to be replicated in virtual reality content. Capture zone 302 may refer to a particular area within a real-world scene defined by placement of capture devices being used to capture visual and/or audio data of the real-world scene. For example, if a real-world scene is associated with a basketball venue such as a professional basketball stadium where a professional basketball game is taking place, capture zone 302 may be the actual basketball court where the players are playing or a portion of the basketball court defined by a plurality of microphones or other capture devices.
To capture sound within capture zone 302,
As shown, directional microphones 316 are disposed at each corner of capture zone 302, which is depicted as a quadrilateral shape (e.g., a square or a rectangle). In the example of
In certain examples, each microphone 316 may be a single-capsule microphone including only a single capsule for capturing a single (i.e., monophonic) audio signal. In other examples, one or more of microphones 316 may include multiple capsules used to capture directional signals (e.g., using beamforming techniques or the like). However, even if none of microphones 316 are implemented as a full-sphere multi-capsule microphone such as an Ambisonic microphone or the like, the captured set of audio signals captured by microphones 316 may be used to generate a simulated set of audio signals representative of a microphone capture of a full-sphere multi-capsule microphone disposed at a particular location within capture zone 302.
In certain examples, each directional microphone 316 may be implemented by a discrete physical microphone. In other examples, however, exclusive use of discrete physical microphones to implement each directional microphone 316 may be impractical or undesirable. For instance, if capture zone 302 is implemented as a relatively large physical space such as, for example, an entire football field, a directional microphone 316 disposed at one corner of capture zone 302 (e.g., microphone 316-1) may not be well-equipped to capture sound originating near other corners of capture zone 302 (e.g., such as the opposite corner near microphone 316-4). In such examples, or other examples in which discrete physical microphones may not be well equipped to capture sound in at least certain areas of capture zone 302, one or more of directional microphones 316 may be implemented as a uniform linear array (“ULA”) microphone.
As used herein, a “ULA microphone” may refer to a virtual microphone that is composed of a plurality of microphones disposed at different locations (i.e., as opposed to a physical microphone disposed at one particular location) that are combined and processed together to form audio signals not captured by any particular physical microphone in the uniform linear array. For example, respective audio signals from the plurality of microphones composing a ULA microphone may be processed together so as to generate a single audio signal (e.g., a directional audio signal) representative of what the ULA microphone captures. In some examples, a plurality of microphones composing a ULA microphone implementing one of directional microphones 316 may include a plurality of omnidirectional microphones disposed at different locations with respect to capture zone 302. Even though each of these omnidirectional microphones may capture an omnidirectional audio signal, when processed together in a suitable way (e.g., using beamforming techniques), these omnidirectional signals may be used to generate a directional signal to be used in the captured set of audio signals captured by directional microphones 316.
In some examples, audio signals captured by particular physical microphones may be employed as audio signals in their own right, as well as combined with other audio signals to generate ULA audio signals. For example, an audio signal captured by microphone 316-1 may be included in a captured set of audio signals provided to system 100 while also contributing (e.g., along with audio signals captured by microphones 316-2 and 316-3) to a ULA audio signal for directional microphone 316-4, which may be implemented, at least for certain sounds near directional microphone 316-1, as a ULA microphone that is composed of the three discrete physical microphones implementing directional microphones 316-1 through 316-3.
By implementing one or more of directional microphones 316 as ULA microphones, it may be possible for a virtual reality media provider to scale capture zone 302 to be a larger size than might be practically possible relying on only discrete physical microphones. For instance, in some examples, a real-world scene of a relatively large size (e.g., the size of a city) and that includes one or more capture zones such as capture zone 302 may be served by a large array of microphones distributed in various locations within the real-world scene. This array of microphones may be combined in different ways to form different ULA microphones as may serve a particular implementation.
As illustrated in
While
As described above, system 100 may provide various benefits by performing various operations from within a configuration (e.g., configuration 200) to simulate full-sphere microphone capture for one or more arbitrary locations within a capture zone of a real-world scene (e.g., locations 308 within capture zone 302). Examples of some of these operations that system 100 may perform will now be described in more detail.
While
As illustrated, certain operations depicted in dataflow 400 may be performed in the time domain (e.g., performed using signals represented as varying amplitudes with respect to time). Other operations may be performed in the frequency domain (e.g., performed using signals represented as varying magnitudes and phases with respective to different frequency ranges). Still other operations may be performed to transform or convert signals between the time domain and the frequency domain. While operations in
In like manner, dataflow 400 illustrates a line between operations performed on a server-side (e.g., a provider side of a distribution network such as network 208) by system 100 or another component of a virtual reality provider system such as virtual reality provider system 206, and operations performed on a client-side (e.g., a user side of the distribution network) by a media player device such as media player device 210. In the example of
Each of operations 402 through 426 will now be described in more detail with reference to
Time-domain signal access operation 402 may include capturing data or otherwise accessing captured data representative of a captured set of audio signals. The captured set of audio signals may each be captured in the time domain and may be analog or digital signals as may serve a particular implementation. Accessing the captured set of audio signals for time-domain signal access operation 402 may be performed in any of the ways described herein.
Plane wave decomposition operation 404 may include any form of plane wave decomposition of the captured set of audio signals as may serve a particular implementation. While sound captured within a capture zone may not literally constitute ideal plane waves, it may be convenient mathematically to apply signal processing to audio signals that have been decomposed into estimated plane wave constituents. In other words, rather than performing signal processing on the captured set of audio signals in the time domain, it may be mathematically convenient to perform the signal processing in the frequency domain. To this end, plane wave decomposition operation 404 may include transforming each of the audio signals in the captured set of audio signals into a respective frequency-domain audio signal by way of a suitable frequency-domain transform technique such as a fast Fourier transform (“FFT”) technique or the like. Once converted, plane wave decomposition operation 404 may further involve converting complex values included within each of the respective frequency-domain audio signals from a Cartesian form to a polar form. In polar form, magnitudes of each complex value may represent a magnitude of a particular frequency component (e.g., a particular plane wave constituent of the audio signal) while angles of each value may represent a phase of the particular frequency component.
To illustrate,
Magnitude component 504 includes values representative of respective plane wave magnitudes at each frequency in a number of discrete frequencies or frequency ranges (also referred to as “frequency bins”) provided by the frequency-domain transform technique (e.g., the FFT technique). Similarly, phase component 506 includes values representative of respective plane wave phases at each frequency in the frequencies provided by the frequency-domain transform technique. For example, as shown, a lowest frequency bin provided by the frequency-domain transform technique may represent a plane wave having a magnitude of “3” and a phase of “7,” a second lowest frequency bin may represent a plane wave having a magnitude of “4” and a phase of “8,” and so forth. It will be understood that the single digit values illustrated in
System 100 may perform plane wave decomposition operation 404 to generate magnitude component 504 and phase component 506 of the polar-form frequency-domain audio signal in any suitable way. For example, system 100 may employ an overlap-add technique to facilitate real-time conversion of audio signals from the time domain to the frequency domain. The overlap-add technique may be performed by system 100 prior to the frequency-domain transform technique to avoid introducing undesirable clicking or other artifacts into a final renderable set of audio signals that is to be generated and provided to the media player device for playback to the user.
Returning to
Specifically, after system 100 generates a set of frequency-domain audio signals (e.g., such as the one illustrated in
In the example illustrated in
As shown, the distance between microphone 316-1 and location 308-1 may not happen to be an exact multiple of wavelengths 604. As a result, sounds arriving at microphone 316-1 with phase 602 may be expected to arrive at location 308-1 with a different phase such as a projected phase 606.
It will be understood that projected phase 606 may represent an estimation of a phase to be expected at location 308-1 because the geometry of the sound source with respect to microphone 316-1 and location 308-1 may also need to be taken into account to determine an exact phase to be expected at location 308-1 based on the phase measured at microphone 316-1. For instance, as mentioned above, in examples where location 308-1 is in the near field with respect to one or more sound sources generating the sounds from which plane wave 600 originates, projected phase 606 may be an accurate estimation of the phase to be expected at location 308-1. As such, the detail of where the sound sources are located may be ignored and projected phase 606 may be used to accurately simulate the phase that would be captured at location 308-1.
However, in other examples such as where location 308-1 is in the far field with respect to the one or more sound sources, it may be desirable to take the location of the one or more sound sources into account to improve the projected phase approximation for location 308-1. For example, along with identifying the location corresponding to the virtual location at which the user is virtually located, system 100 may further identify within the capture zone one or more locations of one or more sound sources at which sound represented within the captured set of audio signals originates. Accordingly, the generating of the simulated set of audio signals representative of the simulation of the full-sphere multi-capsule microphone capture may be further based on the identified one or more locations of the one or more sound sources. The identified one or more locations of the one or more sound sources may be used to generate the simulated set of audio signals in any suitable manner. In some examples, the projected phase approximation may be improved iteratively in situations where multiple sound sources exist at different locations.
Regardless of whether one or more positions of the one or more sound sources are taken into account, projected phase 606 may be determined and simulated based on wavelength 604 and based on the distance between microphone 316-1 and location 308-1, as shown. System 100 may determine and track the distance between the location of the user (e.g., location 308-1 in this example) and each directional microphone in the plurality of directional microphones (e.g., including microphone 316-1 in this example) in any manner as may serve a particular implementation. For example, a known distance from a virtual location of the user (e.g., virtual location 310-1) to a particular corner of virtual reality space 304 in the virtual realm may have a known constant relationship with an actual distance between a corresponding location (e.g., location 308-1) and a corresponding corner of capture zone 302 (e.g., where microphone 316-1 is located).
Thus, once the distance between microphone 316-1 and location 308-1 and wavelength 604 have been determined, a phase shift between phase 602 and phase 606 may be calculated as a wavelength-normalized product of 2π and a length 608 defined as the remainder of the distance divided by wavelength 604 (i.e., determined by performing a modulo operation (“%”) on the distance and the wavelength). In other words, if the distance between microphone 316-1 and location 308-1 is represented by “d” and wavelength 604 is represented by “λ”, a phase shift “Δθ” between phase 602 and phase 606 may be represented mathematically by Equation 1:
Accordingly, phase compensation operation 406 may determine projected phase 606 associated with location 308-1 by subtracting phase 602 from the phase shift (Δθ) calculated using Equation 1. As described above, phase compensation operation 406 may involve performing this calculation for each frequency bin included in each frequency-domain audio signal.
Returning to
Specifically, after system 100 generates the set of frequency-domain audio signals (e.g., such as the one illustrated in
Sound intensity is known to fall off in accordance with the inverse-square law, or, in other words, to be inversely proportional to the square of the distance from the sound source. Accordingly, as shown in
Returning to
Back in the time domain, the simulated set of audio signals transformed by signal reconstruction operation 410 may essentially represent a simulation of an A-format signal that would be captured by a full-sphere multi-capsule microphone (e.g., a first order or higher order Ambisonic microphone) at the location within the capture zone. However, because the phase and magnitude compensations are projected from inward-looking directional microphones 316 rather than, for instance, outward-looking directional capsules of an actual full-sphere multi-capsule microphone, the phase of each of the time-domain audio signals may be inverted. To remedy this issue, phase inversion operation 412 may be performed to invert the simulated audio signals.
Additionally, time alignment operation 414 may be performed on each of these signals based on the respective distance of each microphone 316 from the identified location 308. Directional microphones 316 distributed around capture zone 302 may each capture sounds with slightly different timings than would the respective capsules of the full-sphere multi-capsule microphone being simulated at the identified location 308. Accordingly, time alignment operation 414 may introduce different delays into each of the audio signals in the simulated set of audio signals to simulate each signal being captured simultaneously at a coincident point at the identified location 308.
At this point, the simulated set of audio signals generated by signal reconstruction operation 410 and modified by operations 412 and 414 may represent a simulated set of audio signals representative of a simulation of a full-sphere multi-capsule microphone capture at the identified location 308. For example, the simulated set of audio signals may represent the simulation of the A-format signal that would be captured by the full-sphere multi-capsule microphone at the location 308 within the capture zone. However, for this A-format signal to be used (e.g., rendered for a user as part of a virtual reality experience), the A-format signal may be converted into a renderable set of audio signals such as a B-format signal. In other words, in certain examples, a simulated set of audio signals representative of a simulation of the full-sphere multi-capsule microphone capture may collectively constitute an A-format signal representative of the full-sphere multi-capsule microphone capture, while a renderable set of audio signals may collectively constitute a B-format signal configured to be rendered to simulate the full-sphere sound for the virtual location.
To illustrate,
In particular,
As shown in
As mentioned above, an A-format signal may include sufficient information to implement 3D surround sound, but it may be desirable to convert the A-format signal from a format that may be specific to a particular microphone configuration to a more universal format that facilitates the decoding of the full-sphere 3D sound into renderable audio signals to be played back by specific speakers (e.g., a renderable stereo signal, a renderable surround sound signal such as a 5.1 surround sound signal, etc.). This may be accomplished by converting the A-format signal to a B-format signal. Referring back to
To illustrate aspects of the B-format signal generated by operation 416,
B-format signals such as B-format signal 904 may be advantageous in applications where sound directionality matters such as in virtual reality media content or other surround sound applications. This is because the audio coordinate system to which the audio signals are aligned (e.g., coordinate system 806) may be oriented to associate with (e.g., align with, tie to, etc.) a video coordinate system to which visual aspects of a virtual world (e.g., a virtual reality world) are aligned. As such, a B-format signal may be decoded and rendered for a particular user so that sounds seem to originate from the direction that it appears to the user that the sounds should be coming from. Even as the user turns around within the virtual world to thereby realign himself or herself with respect to the video and audio coordinate systems, the sound directionality may properly shift and rotate around the user just as the video content shifts to show new parts of the virtual world the user is looking at.
In the example of
In this way, the higher-order Ambisonic microphone may provide an increased level of directional resolution, precision, and accuracy for the location-confined B-format signal that is derived. It will be understood that above the first-order (i.e., four-capsule tetrahedral) full-sphere multi-capsule microphone 800 illustrated in
Returning to
Additionally, the processing of the simulated set of audio signals to form the renderable set of audio signals may include mixing one or more of additional audio signals 422 together with the renderable set of audio signals (e.g., the post-filtered B-format signal). For example, additional audio signal mixing operation 420 may be performed by combining additional audio signals 422 into the B-format signal. Additional audio signals 422 may be representative of sound that is not captured by the plurality of directional microphones disposed at the plurality of locations on the perimeter of the capture zone of the real-world scene (e.g. directional microphones 316). For instance, additional audio signals 422 may include voice-over content, announcer or narration content, social chat content (e.g., from other users experiencing the same virtual reality space at the same time), Foley content or other sound effects, and so forth.
Once the B-format signal has been filtered and mixed with other suitable sounds in operations 418 and 420, dataflow 400 shows that the B-format signal may be decoded in signal decoding operation 424. Specifically, system 100 may decode the B-format signal to a particular speaker configuration associated with the media player device upon which the B-format signal is to be rendered. The B-format signal may be decoded to any suitable speaker configuration such as a stereo configuration, a surround sound configuration (e.g., a 5.1 configuration, etc.), or the like.
Finally, once the B-format signal has been processed in any of the ways described above or any other suitable manner, the B-format signal may be considered a renderable set of audio signals that is configured to be rendered by a media player device such as media player device 210. Accordingly, the renderable set of audio signals may be provided (e.g., by way of network 208) to the media player device and rendered (i.e., played back, presented, etc.) for the user as part of a dynamic and immersive virtual reality experience. This is illustrated in dataflow 400 by signal rendering operation 426.
In operation 1002, a microphone capture simulation system may access a captured set of audio signals. For example, the captured set of audio signals may be captured by a plurality of directional microphones disposed at a plurality of locations on a perimeter of a capture zone of a real-world scene, and the microphone capture simulation system may access the captured set of audio signals from the plurality of directional microphones. Operation 1002 may be performed in any of the ways described herein.
In operation 1004, the microphone capture simulation system may identify a location within the capture zone. For example, the location may correspond to a virtual location at which a user is virtually located within a virtual reality space that is being experienced by the user and is based on the capture zone of the real-world scene. Operation 1004 may be performed in any of the ways described herein.
In operation 1006, the microphone capture simulation system may generate a simulated set of audio signals representative of a simulation of a full-sphere multi-capsule microphone capture at the location at which the user is virtually located. For example, the microphone capture simulation system may generate the simulated set of audio signals based on the captured set of audio signals accessed in operation 1002 and the location identified in operation 1004. Operation 1006 may be performed in any of the ways described herein.
In operation 1008, the microphone capture simulation system may process the simulated set of audio signals to form a renderable set of audio signals. For instance, the renderable set of audio signals may be configured to be rendered by a media player device used by the user. In some examples, when rendered by the media player device, the renderable set of audio signals may simulate full-sphere sound for the virtual location identified in operation 1004 while the user is virtually located at the virtual location within the virtual reality space. Operation 1008 may be performed in any of the ways described herein.
In operation 1102, a microphone capture simulation system may access a captured set of audio signals. The captured set of audio signals may be captured in real time by a plurality of directional microphones disposed at a plurality of locations on a perimeter of a capture zone of a real-world scene. In some examples, the microphone capture simulation system may access the captured set of audio signals in real time from the plurality of directional microphones. Operation 1102 may be performed in any of the ways described herein.
In operation 1104, the microphone capture simulation system may identify a first location within the capture zone. The first location may correspond to a first virtual location at which a user is virtually located within a virtual reality space that is being experienced by the user at a first moment in time and that is based on the capture zone of the real-world scene. In some examples, the microphone capture simulation system may dynamically identify the first location in real time. Operation 1104 may be performed in any of the ways described herein.
In operation 1106, the microphone capture simulation system may generate a simulated set of audio signals. The simulated set of audio signals may be representative of a simulation of a full-sphere multi-capsule microphone capture at the first location at the first moment in time. In some examples, the microphone capture simulation system may generate the simulated set of audio signals in real time based on the captured set of audio signals accessed in operation 1102 and the first location identified in operation 1104. Operation 1106 may be performed in any of the ways described herein.
In operation 1108, the microphone capture simulation system may identify a second location within the capture zone. The second location may correspond to a second virtual location at which the user is virtually located within the virtual reality space at a second moment in time subsequent to the first moment in time. In some examples, the microphone capture simulation system may dynamically identify the second location in real time. Operation 1108 may be performed in any of the ways described herein.
In operation 1110, the microphone capture simulation system may update the simulated set of audio signals. For instance, the microphone capture simulation system may update the simulated set of audio signals to be representative of a simulation of a full-sphere multi-capsule microphone capture at the second location at the second moment in time. In some examples, the microphone capture simulation system may update the simulated set of audio signals in real time based on the captured set of audio signals accessed in operation 1002 and the second location identified in operation 1108. Operation 1110 may be performed in any of the ways described herein.
In operation 1112, the microphone capture simulation system may process the simulated set of audio signals to form a renderable set of audio signals. For example, the renderable set of audio signals may be configured to be rendered by a media player device used by the user. When rendered by the media player device, the renderable set of audio signals may simulate full-sphere sound for the first virtual location at the first moment in time and for the second virtual location at the second moment in time. In some examples, the microphone capture simulation system may process the simulated set of audio signals to form the renderable set of audio signals in real time. Operation 1112 may be performed in any of the ways described herein.
In certain embodiments, one or more of the systems, components, and/or processes described herein may be implemented and/or performed by one or more appropriately configured computing devices. To this end, one or more of the systems and/or components described above may include or be implemented by any computer hardware and/or computer-implemented instructions (e.g., software) embodied on at least one non-transitory computer-readable medium configured to perform one or more of the processes described herein. In particular, system components may be implemented on one physical computing device or may be implemented on more than one physical computing device. Accordingly, system components may include any number of computing devices, and may employ any of a number of computer operating systems.
In certain embodiments, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium and executable by one or more computing devices. In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein. Such instructions may be stored and/or transmitted using any of a variety of known computer-readable media.
A computer-readable medium (also referred to as a processor-readable medium) includes any non-transitory medium that participates in providing data (e.g., instructions) that may be read by a computer (e.g., by a processor of a computer). Such a medium may take many forms, including, but not limited to, non-volatile media, and/or volatile media. Non-volatile media may include, for example, optical or magnetic disks and other persistent memory. Volatile media may include, for example, dynamic random access memory (“DRAM”), which typically constitutes a main memory. Common forms of computer-readable media include, for example, a disk, hard disk, magnetic tape, any other magnetic medium, a compact disc read-only memory (“CD-ROM”), a digital video disc (“DVD”), any other optical medium, random access memory (“RAM”), programmable read-only memory (“PROM”), electrically erasable programmable read-only memory (“EPROM”), FLASH-EEPROM, any other memory chip or cartridge, or any other tangible medium from which a computer can read.
Communication interface 1202 may be configured to communicate with one or more computing devices. Examples of communication interface 1202 include, without limitation, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, an audio/video connection, and any other suitable interface.
Processor 1204 generally represents any type or form of processing unit capable of processing data or interpreting, executing, and/or directing execution of one or more of the instructions, processes, and/or operations described herein. Processor 1204 may direct execution of operations in accordance with one or more applications 1212 or other computer-executable instructions such as may be stored in storage device 1206 or another computer-readable medium.
Storage device 1206 may include one or more data storage media, devices, or configurations and may employ any type, form, and combination of data storage media and/or device. For example, storage device 1206 may include, but is not limited to, a hard drive, network drive, flash drive, magnetic disc, optical disc, RAM, dynamic RAM, other non-volatile and/or volatile data storage units, or a combination or sub-combination thereof. Electronic data, including data described herein, may be temporarily and/or permanently stored in storage device 1206. For example, data representative of one or more executable applications 1212 configured to direct processor 1204 to perform any of the operations described herein may be stored within storage device 1206. In some examples, data may be arranged in one or more databases residing within storage device 1206.
I/O module 1208 may include one or more I/O modules configured to receive user input and provide user output. One or more I/O modules may be used to receive input for a single virtual reality experience. I/O module 1208 may include any hardware, firmware, software, or combination thereof supportive of input and output capabilities. For example, I/O module 1208 may include hardware and/or software for capturing user input, including, but not limited to, a keyboard or keypad, a touchscreen component (e.g., touchscreen display), a receiver (e.g., an RF or infrared receiver), motion sensors, and/or one or more input buttons.
I/O module 1208 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers. In certain embodiments, I/O module 1208 is configured to provide graphical data to a display for presentation to a user. The graphical data may be representative of one or more graphical user interfaces and/or any other graphical content as may serve a particular implementation.
In some examples, any of the facilities described herein may be implemented by or within one or more components of computing device 1200. For example, one or more applications 1212 residing within storage device 1206 may be configured to direct processor 1204 to perform one or more processes or functions associated with facilities 102 or 104 of system 100. Likewise, storage facility 106 of system 100 may be implemented by or within storage device 1206.
To the extent the aforementioned embodiments collect, store, and/or employ personal information provided by individuals, it should be understood that such information shall be used in accordance with all applicable laws concerning protection of personal information. Additionally, the collection, storage, and use of such information may be subject to consent of the individual to such activity, for example, through well known “opt-in” or “opt-out” processes as may be appropriate for the situation and type of information. Storage and use of personal information may be in an appropriately secure manner reflective of the type of information, for example, through various encryption and anonymization techniques for particularly sensitive information.
In the preceding description, various exemplary embodiments have been described with reference to the accompanying drawings. It will, however, be evident that various modifications and changes may be made thereto, and additional embodiments may be implemented, without departing from the scope of the invention as set forth in the claims that follow. For example, certain features of one embodiment described herein may be combined with or substituted for features of another embodiment described herein. The description and drawings are accordingly to be regarded in an illustrative rather than a restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
4042779 | Craven | Aug 1977 | A |
20050080616 | Leung | Apr 2005 | A1 |
20090046864 | Mahabub | Feb 2009 | A1 |
20090237564 | Kikinis | Sep 2009 | A1 |
20090316913 | McGrath | Dec 2009 | A1 |
20170132902 | Foster | May 2017 | A1 |
20170311080 | Kolb | Oct 2017 | A1 |
20180098173 | van Brandenburg | Apr 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20190200156 A1 | Jun 2019 | US |