The present disclosure relates to the field of audio processing in virtual environments, and more particularly the creation and handling of audio groups in a virtual environment.
Virtual environments generally aim to present digital content in a format that simulates realistic audio and visual cues. In most cases, this realistic depiction is desirable to create a more immersive experience to a user. However, there are certain scenarios where altering sensory stimuli away from a realistic presentation creates a more preferable experience.
One such scenario arises in the context of conversations in a typical multi-user, virtual environment. In this example scenario, users are presented as avatars occupying a three-dimensional volume; however, when the avatars transmit audio they act as point sound sources within the environment. When a user is trying to identify or focus on audio from a primary source (e.g., an individual or group in the environment), audio from secondary sound sources around the user can interfere with the user's ability to clearly discern audio from the primary source. Sounds coming from all directions in the environment can be both distracting and aggravating when trying to focus on a single source, diminishing the quality of the user experience.
In a real-world analogue, such as at a party with multiple audio sources (people, music, television, etc.), listeners are able to focus on a primary conversation of interest in the midst of secondary audio streams present all around them. This so-called “cocktail party effect” relies on an individual's ability to tune-out extraneous noise with the assistance of sophisticated binaural localization techniques, and individual signal-to-noise ratio optimizations at the ears impacted by the listener's specific body geometry, including the head, torso, and ears. These criteria form part of an individualized head-related transfer function (HRTF), which can differ greatly from person to person. In addition to listener-based audio cues, lip reading, body language, familiarity, and context also help one's brain to isolate particular speech in a noisy environment.
While virtual environments are always improving, the level of detail in both the acoustic qualities of the synthesized sound and the subtleties of the visual presentation make it difficult for a user to utilize real-world means of isolating speech in the presence of a distracting environment. Therefore, a sound processing technique is needed to enhance audio intelligibility from a primary source in noisy virtual environments.
The various systems and methods disclosed herein provide for an enhanced virtual experience within a virtual environment. In some implementations, the system and methods disclosed herein provide an enhanced audio experience for users navigating avatars within a virtual environment. In some implementations, a system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform virtual environment enhancements will be described. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the enhancement actions. One general aspect includes a method of audio processing in a multi-user virtual environment having a plurality of audio sources. The method of audio processing also includes determining a group status of a user; receiving an audio object at the user; classifying the received audio object as a primary audio object or a secondary audio object based upon the determined group status of the user; processing the received audio object at a first sound processor if classified as a primary audio object and at a second sound processor if classified as a secondary audio object, where the first sound processor applies a first set of filters to audio objects processed therethrough, and the second sound processor applies a second set of filters to audio objects processed therethrough that are different from said first set of filters. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features. The method where the group status of a user can be either grouped with a distinct group or ungrouped, and the step of classifying classifies all received audio objects as primary audio objects when the group status is ungrouped. The group status of a user can be either grouped with a distinct group, or ungrouped, and when the group status of the user is grouped, the step of classifying classifies received audio objects as primary audio objects if the audio objects come from audio sources that are members of the distinct group of the user, and classifies received audio objects as a secondary audio objects if the audio objects do not come from audio sources that are members of the distinct group of the user. The step of determining further may include the steps of: identifying an audio source within a focus area of a visual scene of the virtual environment presented to the user; evaluating a distance between a user's avatar within the virtual environment and the audio source within the virtual environment; grouping with the audio source into a distinct group; and setting a group status of the user to grouped and associated with the distinct group. The step of identifying includes calculating a dot product of a facing vector directed toward a facing direction of a user's avatar, and a source vector in a direction from the user's avatar to the audio source. The method may include the step of maintaining a grouped status by: calculating a common focal point at a geometric center of a plurality of members of a distinct group and a group perimeter encircling the plurality of members of the distinct group, verifying that the user's avatar is facing either the geometric center of the distinct group or a member of the distinct group. Verifying includes calculating a dot product of a facing vector directed toward a facing direction of the user's avatar, and a source vector in a direction from the user's avatar to the geometric center of the distinct group and each member of the distinct group. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium
In some aspects, the techniques described herein relate to a system for processing audio in a virtual environment a virtual reality apparatus having a control unit, a sensory processing unit, and a non-transitory storage unit having instructions stored thereon that, when executed by the control unit and the sensory processing unit, causes the control unit and sensory processing unit to perform at least the following: determining a group status of a user; receiving an audio object at the user; classifying the received audio object as a primary audio object or a secondary audio object based upon the determined group status of the user; processing the received audio object at a first sound processor if classified as a primary audio object and at a second sound processor if classified as a secondary audio object, where the first sound processor applies a first set of filters to audio objects processed therethrough, and the second sound processor applies a second set of filters to audio objects processed therethrough that are different from said first set of filters. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
In yet some additional aspects, the techniques described herein relate to a method of processing audio in a virtual environment may include the steps of: defining a group status of a user; receiving group status and audio output data of an audio object coming from an audio source, where the audio output data includes default audio output parameters; comparing object class data of the audio object with object class data of the user; calculating, at a first sound processor with a first filter, filtered audio output parameters of the audio output data of audio objects having the same object class data as the user; calculating, at a second sound processor with a second filter, filtered audio output parameters of the audio output data of audio objects not having the same object class data of the user; and transmitting the audio output data with filtered audio output parameters to an audio output device. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features. The method where the step of defining includes the steps of: identifying whether an audio source is within a focus area of the user, determining whether the user is within a join distance of an audio source within the focus area of the user, joining with the audio source into a primary group, and altering the group status of the user to include a grouped status associated with the primary group. The default audio output parameters include a default volume level, and where the second filter produces filtered audio output parameters having a lowered volume level.
As described above and set forth in greater detail below, systems in accordance with aspects of the present disclosure provide a specialized computing device integrating non-generic hardware and software that improve upon the existing technology of human-computer interfaces in a virtual environment by providing unconventional functions, operations, and audio processing for generating interactive display and audio experience outputs in the virtual environment. The features of the system provide a practical implementation that improves the operation of the computing systems for their specialized purpose of providing audio processing in virtual environments, and more particularly the creation and handling of audio groups in a virtual environment. In some implementations, the use of directional vectors and dot product vector processing reduces the computational demands for audio processing thereby creating an enhanced audio experience for avatars interacting within the virtual environment.
The disclosure herein provides various implementations of virtual audio processing systems and methods from which those skilled in the art shall appreciate various novel approaches and features developed by the inventors. These various novel approaches and features, as they may appear herein, may be used individually, or in combination with each other, as desired.
In particular, the implementations described, and references in the specification to “one implementation”, “an implementation”, “an example implementation”, etc., indicate that the implementation(s) described may include a particular feature, structure, or characteristic, but every implementation may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature, structure, or characteristic is described in connection with an implementation, persons skilled in the art may implement such feature, structure, or characteristic in connection with other implementations whether or not explicitly described.
With respect to the various implementations described herein, the term “virtual environment” is meant to encompass any fully or partially simulated environment presented to a user, including virtual reality with its fully artificial environment, augmented reality presenting a real environment overlaid with virtual objects, and mixed reality presenting virtual interactive objects presented within a real environment.
Simulated environments typically involve digital presentation of a visual scene from a head mounted display but may also be projected onto a surface. The visual presentation can provide the same image to both eyes or different images to each eye to create a stereoscopic view, for example. Similarly, audio can be presented to a user through speakers, whether integrated into a head mounted display, through separate headphones, or through speakers within the real environment. In order to enhance the realism of the virtual environment, audio is typically presented to a user over multiple channels that are mixed to reflect the location of virtual objects within the environment. Alternatively, audio can be presented via a mono signal for location agnostic audio sources, such as an informational narration of virtual scenes. Increasingly, tactile feedback is also presented to a user through haptic devices in hand-held controllers, in a head mounted display, or external to the user. Tactile feedback can be presented in a position-based manner, much like audio. Specifically, tactile feedback may include vibration of one or more controllers, vibration of one or more actuators in a head mounted display, and movement of air from fans or ultrasonic devices, to name a few.
The following descriptions of the exemplary implementations are primarily in reference to a virtual reality environment for the sake of simplification, but does not limit the implementations to specific features of the exemplary virtual environment or user's hardware or software used to access such virtual environment.
In some implementations a virtual environment includes virtual audio/visual object sets, such as a multimedia entertainment system 120 that includes a virtual display 121 presenting a static or video image, left speaker 122 and right speaker 123 associated with respective left sound object 162 and right sound object 163, a subwoofer 125 that emanates tactile feedback 170, and a video capture device 127 that can record a scene of the virtual environment from the perspective of the device 127.
In some implementations, a virtual microphone 140 can be included in the environment 100 that captures an audio object from a nearby avatar and transmits it to an associated virtual loudspeaker 132 capable of reproducing that sound object therefrom. The reproduced sound from microphone 140 can include some transformation of the captured audio, such as increased volume, different volume decay characteristics, pitch/frequency shifting, distortion, voice changing, language translation, etc.
The environment 100 can be provided with one or more non-user audio sources, such as system messages, background music, or characters that are pre-loaded with information, music, movement paths, artificial intelligence routines, etc. These non-user audio sources can be associated with avatars or virtual objects with spatial audio that mimics real-world acoustic behavior or with altered acoustic behavior such as limiting reception distance or enhancing typical volume decays, or are presented as “voice of god” type system messages or background audio (represented as audio object 160) that are received at each listener with the same characteristics (volume, reverb, tone, etc.) at every point in the virtual environment. The environment 100 also includes avatars that represent users of the virtual environment 100. Avatars are able to navigate within the environment in a continuous manner, such as walking, or by space-jumping to a selected location within the environment.
Users can transmit and receive audio through their avatars. For example, in the environment 100, avatar 150A is listening to audio (represented as an audio object 165) from avatar 150B. In back of avatar 150A, avatar 150C is watching (on display 121), listening to (through speakers 122 and 123), and feeling (through subwoofer 125) sensory output of the entertainment system 120. Next to avatar 150A is avatar 150D engaged in conversation (audio object 166) with avatar 150E. Avatar 150G is looking at avatar 150F singing (via audio object 164) into a virtual microphone 140 that functions to broadcast the audio object 164 through the loudspeaker 132 as audio object 161. Each of the audio, visual, and haptic objects within the environment are processed as point sources for purposes of determining positions of audio objects and visual objects.
A visual scene presented to a user can be updated without input from the user such as in a virtual tour or virtual ride (e.g., a rollercoaster), or by receiving movement input from the user to manipulate the user's position and orientation with the virtual environment. For example, in some implementations a virtual environment system may track a user's movements within a real environment through motion capture systems, including inertial systems with sensors such as gyroscopes, magnetometers, and accelerometers attached at one or more positions on a user's body, or optical systems using cameras or time-of-flight sensors that track markers on or features of the user in relation to another position (the environment, other markers, the camera location, etc.). The cameras or time-of-flight sensors can be located at one or more positions within the real environment, or attached to a user's body, such as being integrated into a head mounted display to track movement through the real environment or motion of the user's hands and fingers to determine grasp and orientation. Alternatively, or in combination, the user can utilize one or more input devices such as controllers 230R and 230L having finger actuated controls 232R and 232L, respectively, (e.g., buttons, joysticks, trackpads, etc.) to move a user's avatar through a virtual environment. Controllers 230R and 230L can also include motion tracking sensors to track a user's arm or body movements that can be reflected on their avatar in the virtual environment.
As the visual scene is updated, the audio scene must also be updated to enhance the realism and immersion of the virtual environment. The computation required to render a synchronized audio-visual presentation can be done fully on board the head mounted display 210, or can utilize external devices such as a mobile device 250 or computing device 240. The mobile device 250 and computing device 240 can provide data links 252 and 242, respectively, through a wireless connection such as Bluetooth or wi-fi, hard-wired through cabling, or a combination of wireless and wired technologies. Computing device 240 can be a local computing device, a hosted server accessed over a network, or both.
The presentation of visual and audio scenes depends on system or user settings regarding a user's field of view of the virtual environment.
Referring back to
In some implementations, loudspeaker 132 exhibits a different volume decay curve than avatars 150B, 150D, and 150F, such as having a louder reference volume and covering more area than the avatars. In some implementations the audio transmitted by avatar 150F and received at microphone 140 is transmitted from the location of the loudspeaker 132, which amplifies the volume of the audio, and changes the volume decay of the audio, such as exhibiting the symmetrical volume decay of
In some implementations, speakers 122 and 123 of the entertainment system 120 (shown in
In some implementations, reverberation of audio signals off physical objects and boundaries can be additive to the respective signals, thereby amplifying the sounds. While such a transformation would potentially be more realistic and immersive, reverberation shall not be explored as a modifying factor for the purposes of this illustrative example of some implementations described herein.
For audio sources having volume decays that are radially symmetric, such as loudspeaker 132 and avatars 150B, 150D and 150F, the direction an audio source is facing (532f, 550Bf, 550Df and 550Ff, respectively) does not affect the perceived volume of a listener at a set radius from the audio source regardless of the angular direction of the listener with respect to a facing direction. In some implementations, volume can remain constant in a radial direction, but other features such as tone or reverberation can be altered to correspond to the facing direction of an audio source.
As shown in
In some implementations, a system and method may be employed to filter the volume from audio of interest to a user at a particular time and place within the virtual environment (the “primary audio”) so that the primary audio will be heard more clearly than audio from secondary audio sources of lesser interest to the user (the “secondary audio”).
Mathematical tools may be employed to filter the volume of the audio transmitted from each audio source such that the quality, including volume and discernability, of the primary audio is high enough to overcome the audio masking effect of secondary audio. In other words, filters can be applied to increase the intelligibility of a primary audio source “signal” in relation to the “noise” of secondary audio sources acting to interfere with the primary source. This is done by increasing the quality of the primary audio, decreasing the quality of the secondary audio, or some combination thereof.
In some implementations, increasing the intelligibility of the primary audio source includes keeping the quality of the primary audio unchanged, while the interfering attributes of the secondary audio are reduced or otherwise modified to make the secondary audio less distracting. In some implementations this relies on reducing the volume of secondary audio.
In some implementations, in addition to the distances referred to in reference to
In some implementations, audio coming from sources in front of a user's avatar is prioritized over audio coming from behind, as users are likely to face audio sources of interest. An evaluation of a dot product between a vector defined by the facing direction of an avatar (which will be referred to herein as the “facing vector”) and a vector defined by the direction of an audio source can be used (which will be referred to herein as the “source vector”), in some implementations, to selectively filter or selectively alter the volume or other characteristic of the audio at a user's avatar in the virtual environment.
The dot product of two vectors is the sum of the products of corresponding components of each vector, written a·b=(ax×bx)+(ay×by). The result of this algebraic operation is a scalar value that decreases as the angle between the facing vector 550Af and a source vector increases. In some implementations, this value is used to lower the volume of audio sources proportionate to the angle between the facing direction and the audio source.
For the scenario of
Referring back to
In some implementations, the value of a dot product, represented by a value P, can be used by directly multiplying the dot product result to a respective original audio volume level Vo at the user's avatar 150A to determine the modified volume Vm, represented by equation
In this scenario, for negative numbers (i.e., for audio coming from a directional source greater than 90 degrees from the facing directions), the modified volume Vm will be zero. As can be appreciated, in one implementation, a rapid computational assessment of the dot product result between a facing vector and audio source vectors can be used to alter or attenuate (e.g., filter) the audio perceived by the avatar as emanating from audio sources at angles greater than 90 degrees from the facing direction.
Alternatively, in some implementations, a dot product can be used to modify a volume through a mathematical function dependent on the value of the dot product. For example, in some implementations, the value of a modified volume Vm of an original volume Vo can be represented using the value P of a dot product as Vm=V, (1+P)/(3−P), resulting in Vm=Vo when the facing vector is in the same direction as the source vector (P=1), and V=Vo/3 when the facing vector is perpendicular to the source vector (P=0), and Vm=0 when the facing vector is opposite the source vector (P=−1).
In some implementations the dot product can be used in an equation that shifts the volume decay curve away from a listening position along a distance axis (illustrated in
The virtual environment server 840 receives updates from VR Apparatus 810 through its networking unit 842. The updates are processed by control unit 844 and storage unit 846 is updated with pertinent information regarding VR Apparatus 810. Virtual environment server 840 likewise communicates through network 850 with other multiple other VR apparatuses 860a, 860b and up to 860n, which may contain the same or similar elements of VR apparatus 810 to access the same or different virtual environment as that provided to VR apparatus 810. In some implementations, processing and data storage are provided by cloud-based third-party service providers 870, and can offload processing and storage from the virtual environment server 840 and VR apparatus 810/860n. Any reference to processing or storage on any device, including VR apparatus or servers described herein, should be interpreted to include cloud-based processing and storage performed by third parties to deliver a useful product to the devices.
In some implementations, control unit 814 of VR apparatus 810 includes equivalent elements thereof as the control unit 940 of virtual environment server 900.
This modified audio output data is sent to a sound scene renderer 1023 to build a sound scene that matches the location of audio sources around the user's avatar with applicable visual objects within the virtual environment. The sound scene is then processed into separate audio streams by the multichannel mixer 1024 and sent through respective audio channels to first and second sound outputs 1071 and 1072 of audio I/O unit 1070 of sensory I/O unit 1050. Sound input 1076, such as one or more microphones, of audio I/O unit 1070 of sensory I/O unit 1050 captures audio from a user and sends that audio data, via sound input processor 1026 of sound processing unit 1020, to user database 964 of storage unit 960 of virtual environment server 900.
The location of visual objects that correspond with sound objects in the virtual environment is determined by visual processing unit 1040 using information from a user position/orientation processor 1046 as well as data stored in storage unit 816 of VR apparatus 810 or in storage unit 960 of virtual environment server 900. Data is requested by the visual processing unit 1040 (and the sound processing unit 1020) for the positional data and retrieved and delivered by the visual environment processor 1042, in some implementations further relying on control unit 840 of the virtual reality apparatus 810 to coordinate the retrieval and routing of the data. Visual processing unit 1040 uses this positional data to update the visual characteristics of the virtual environment, including the position and orientation of a user's avatar in the virtual environment. This information is then rendered for consumption by visual scene renderer 1044 that provides a visual scene viewable on visual scene output 1092 of visual I/O unit 1090 according to settings such as field of view, graphics resolution, etc. User position/orientation input 1096, such as controllers, head mounted display motion sensors, camera data, etc. is then provided to the visual processing unit 1040 to update the visual environment processor 1042, as well as storage unit 816 of VR apparatus 810 and user database 964 of storage unit 960 of virtual environment server 900.
An example implementation with application of the concepts and features described hereinabove will now be described. In some implementations of a virtual environment, a user's primary audio of interest may arise from a single audio source or a group of audio sources (e.g., avatars) engaging in a conversation. With groups of 3 or more avatars, it is common for an avatar to be facing other than directly at an avatar contributing to a group conversation. In order to avoid attenuating the volume of a group member who may be outside of a user's visible scene, audio from members of a group is excluded from classification as a secondary audio source.
At block 1120, the status of the user's membership in a group is retrieved. If the group status of the user is Ungrouped, the received audio is considered primary audio and is sent through a first sound processor 1021 of sound processing unit 1020. First sound processor 1021 applies a first set of filters to the audio. If a Grouped status is present for the user, the group status of the retrieved audio is compared to the user's group status, and if they are both Grouped in the same distinct group, the audio is considered primary audio and is sent through a first sound processor 1021. In some implementations, first sound processor 1021 can process the primary audio “normally” for a group, such as through realistic spatial audio filters for directional audio. If the retrieved audio is not grouped in the same distinct group as the user, the audio is considered secondary audio and is sent through second sound processor 1022. In some implementations, second sound processor 1023 processes the secondary audio “abnormally” for those outside of a group, such as by filtering audio to have lower than normal volume.
Once the audio is filtered, its filtered form then proceeds through the blocks of rendering 1130 (e.g., associating processed audio with physical locations), mixing 1140 (e.g., where audio from multiple sources may be combined and apportioned to multiple audio channels), and delivery 1150 the mixed audio to audio outputs 1071 and 1072 of the sensory I/O unit 1050. Once delivery of this audio is complete the method 1100 restarts and processes the next frame of audio information.
At block 1220, the sensory processing unit 1010 of VR apparatus 810 uses data retrieved from the virtual environment server 840 and stored in storage unit 816 to process the new environment in terms of the location of the visual and sound objects within the environment. Visual processing unit 1040 applies visual parameters such as lighting, texture mapping, etc., for each visual object, and first and second sound processors 1021 and 1022 apply audio filters, adjust global volumes, etc. At block 1230, the updated virtual environment data and user avatar position/orientation data are used by the sensory processing unit 1010 to identify how the virtual environment is presented to the user, where the field of view is rendered with processed visual objects by visual scene renderer 1044, and audio objects are rendered with respect to the user avatar's position and orientation, that is, mapped to the updated locations of visual objects in the virtual environment both within and outside of the field of view, at the sound scene renderer 1023.
At block 1240 the multichannel mixer 1024 of the sound processing unit 1020 produces a single audio stream to be delivered for each of the multiple channels using the position/orientation of the user's avatar in relation to each of the rendered audio sources. For example, if the multichannel mixer receives one audio signal originating from an audio source to the left of the avatar, one from the right, and one directly in front of the avatar, the mixer 1024 may, in some implementations, mix the three signals into two channels, a left channel and a right channel, with audio from the left audio source presented at a higher volume than the right audio source in the left channel, the right audio source higher than the left audio source in the right channel, and the front audio source being split evenly between the left and right channels. Also at this block 1240, the visual scene created by visual scene renderer 1044 may be rendered into two or more visual channels as further described below.
At block 1250, the audio in the mixed sound scene is then diverted by the multichannel mixer 1024 to different audio channels of the audio I/O of the sensory I/O unit 1050, and the visual output is sent to the visual scene output 1092 of the visual I/O unit 1090, which may be a single or multiple video channels sent to one or more screens, projectors, etc. Once the user is presented with a new virtual environment scene, user input is collected by the sensory I/O unit at block 1260 for further processing back at block 1210.
When analyzing grouping at block 1120 of method 1100, the user's group membership status determines to which sound processor the audio objects are routed.
When the answer to query 1320 is no, the process looks to user actions to determine group membership. At query 1325, the system 800 determines whether the user has focused on an audio source for a certain threshold amount of time, Tt. For example, in some implementations the visual scene renderer 1044 of visual processing unit 1040 determines what visual objects are to be rendered into a visual scene. A list of these items can be stored in VR apparatus 810 storage unit 816 and user database 964. This list can keep count of the number of frames (i.e., intervals between successive scene rendering operations, such as at step 1230 of process 1200) that visual objects remain within the visual scene, or within a subset of the visual scene, such as a narrower focus area of the field of view represented by the visual scene. One way to determine the location of an object within a visual scene is to perform a dot product of the user's avatar facing vector and the source vector of an audio source within the visual scene.
In addition to the algebraic definition of a dot product discussed above, the dot product can also be presented in relation to the following geometric definition, a·b=∥a∥∥b∥p cos θ, where ∥a∥&∥b∥ are the magnitudes of the vectors (which for unit vectors is 1), and θ is the angle between the facing vector and the source vector. In some implementations, the facing vector of a user's avatar will align with the center of its field of view (such as those seen in
When the user alters the avatar position or orientation such that an audio source is no longer within a focus area of the visual scene, or if the audio source leaves the scene, the counter for that source is reset. If no audio source within a focus area has exceeded the threshold time Tt at block 1325, the process returns to query 1310. If at block 1325 the system 800 determines that the frame counter for an audio source within the predetermined focus area has exceeded the threshold time Tt, the method 1300 proceeds to block 1330.
At query 1330 the system 800 determines whether an audio source that has remained within a focus area for time Tt is within a threshold distance Dt from the user's avatar. This information is calculated in some implementations by the visual scene renderer 1044 of visual processing unit 1040 (which also sends that information to sound scene renderer to determine a volume level at the user's avatar of the various audio sources within a sound scene). If the user's avatar is too far away (beyond threshold distance Dt) from the audio source the process will reset back to query 1310. If the avatar is within the threshold distance Dt, a group is formed, which will update a user's profile stored in the user database 964 to “Grouped” and associate the grouping with the qualifying audio source, and the process ends at block 1340.
In some implementations, the audio received at avatar 1412 from 1420 may be modified by application of a dot product's scalar value as discussed previously. For example, because avatar 1420 is directly behind avatar 1412, the dot product of avatar 1412's facing vector and the source vector pointing to avatar 1420 would be the largest negative number possible (−1 for unit vectors). As such, under some implementations, the volume of the audio from avatar 1420 would be modified to have a zero volume value for direct application of the dot product (negative values rendered with volume). In some implementations, the dot product merely reduces audio for negative numbers. Either way, application of the dot product to the audio from avatar 1420 would result in reduced interference therefrom.
Because avatar 1410 is in the same group as avatar 1412, avatar 1410's audio is routed through the first sound processor 1021 which, in some implementations, presents audio to the user at a default volume. In some implementations, the diminished interference comes in the form of shifting in direction 1470 by distance x the original volume decay curve 1444 (dashed line) of the audio of avatar 1420 to the position of volume decay curve 1448 (dotted line) as shown in graph 1440B of
In
In
As seen in in
In
In
In
In
In
In
If at query 1815 the audio source is currently a member of a pre-existing group, the user joins that group at block 1820, the group size of the preexisting group is incremented by 1, in the user status is set to Grouped as described above. Once grouped the process at block 1845 calculates a new group center based on the geometry of the group member positions, and calculates a group perimeter with a radius (or other perimeter geometry if, in some implementations, the perimeter is non-circular) based on the number (and in some implementations, the positions) of the group members.
The process 1800 then queries at block 1850 whether the user's avatar is facing at least one group member or the group center in its avatar's focus area, and whether the user's avatar has remained within the group perimeter as calculated at block 1845. If it is then the system process 1800 loops back to block 1845 to recalculate group center and perimeter at the next update interval of the process 1800, taking into account any new or departed or repositioned members of the group.
If the user has not maintained group requirements, the process can in some implementations alert the user through the user's virtual reality apparatus of the users group requirement deficiency at block 1855, where the process waits a time T before moving on to block 1860 where the process queries whether or not the user has returned to a state that is facing the group and within the group perimeter. If the user has reestablished group requirements then the process 1800 returns to block 1845 to recalculate group center and perimeter. If the user has not reestablished group requirements in the process moves to block 1865 removing the user from the group, the group sizes reduced by one at block 1870, user group status is set to “Ungrouped” at block 1875, and the process either returns to block 1805 to restart the process, or the process ends. In some implementations, when a user's group status is set to “Ungrouped,” all audio sources are considered primary audio sources providing primary audio objects.
To summarize, a user's experience within a virtual environment may be improved through the transformation of audio delivered to the user. Specifically, when a user would like to focus on audio coming from a preferred source, whether that source be another user, a group of users, or even a non-user audio source, the system can recognize the desired target, group the user with that audio source, and increase the legibility of the audio source by filtering the audio of the primary audio source, the secondary (non-primary) audio sources, or both. By doing so the user is better able to experience a virtual environment.
In some implementations, the features described herein technologically improves the virtual environment system by the use of directional vectors and dot product vector processing to rapidly assist audio processing thereby creating an enhanced audio experience for avatars interacting within the virtual environment. Virtual environment processing is computationally intensive. Previously, virtual environment processing lacked sufficient audio processing resources to facilitate the user's ability, via his avatar, to clearly discern virtual environment audio from primary and secondary sources. The systems and methods described herein, including the use of directional vectors and dot product vector processing greatly reduce computational loads for audio processing, thereby allowing computationally efficient avatar groupings (where primary audio sources are enhanced and secondary audio sources are diminished) and ungroupings, as the avatar moves within the virtual environment. This technological improvement greatly enhances a user's virtual environment experience.
Implementations described herein may be implemented in hardware, firmware, software, or any combination thereof. Implementations of the disclosure herein may also be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); hardware memory in handheld computers, PDAs, smart phones, and other portable devices; magnetic disk storage media; optical storage media; USB drives and other flash memory devices; Internet cloud storage, and others. Further, firmware, software, routines, instructions, may be described herein as performing certain actions. However, it should be appreciated that such descriptions are merely for convenience and that such actions in fact result from computing devices, processors, controllers or other devices executing the firmware, software, routines, instructions, etc.
Although method/process operations (e.g., blocks) may be described in a specific order, it should be understood that other housekeeping operations can be performed in between operations, or operations can be adjusted so that they occur at different times or can be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing, as long as the processing of the overlay operations are performed in the desired way.
The present disclosure is not to be limited in terms of the particular implementations described in this disclosure, which are intended as illustrations of various aspects. Moreover, the various disclosed implementations can be interchangeably used with each other, unless otherwise noted. Many modifications and variations can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those enumerated herein will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. The present disclosure is to be limited only by the terms of the appended claims, along with the full scope of equivalents to which such claims are entitled. It is also to be understood that the terminology used herein is for the purpose of describing particular implementations only, and is not intended to be limiting.
With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “ a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.
A number of implementations have been described. Various modifications may be made without departing from the spirit and scope of the disclosure. For example, various forms of the method/process flows shown above may be used, with operations or steps re-ordered, added, or removed. Accordingly, other implementations are within the scope of the following claims.