The present invention relates generally to virtual shared environments, artificial intelligence for sensing events in virtual shared environments, and artificial intelligence for producing media content to enhance virtual shared environments.
A method is provided for virtual environment enhancement. A signal is determined from one or more events in a virtual environment. In response to the determined signal triggering virtual environment enhancement, a request based on the determined signal is input to at least one generative artificial intelligence model that in response produces media content. The media content is presented within the virtual environment such that the media content includes one or more distinguishing sensory attributes distinguishing the media content from remaining portions of the virtual environment. A computer system and a computer program product corresponding to this method are also provided herein.
These and other objects, features, and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. The various features of the drawings are not to scale as the illustrations are for clarity in facilitating one skilled in the art in understanding the invention in conjunction with the detailed description. In the drawings:
Detailed embodiments of the claimed structures and methods are disclosed herein; however, it can be understood that the disclosed embodiments are merely illustrative of the claimed structures and methods that may be embodied in various forms. This invention may be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of this invention to those skilled in the art. In the description, details of well-known features and techniques may be omitted to avoid unnecessarily obscuring the presented embodiments.
The present embodiments include a computer-implemented method for virtual environment enhancement. A signal is determined from one or more events in a virtual environment. In response to the determined signal triggering virtual environment enhancement, a request based on the determined signal is input to at least one generative artificial intelligence model that in response produces media content. The media content is presented within the virtual environment such that the media content includes one or more distinguishing sensory attributes distinguishing the media content from remaining portions of the virtual environment. In this manner, virtual presentations within a virtual environment are enhanced in a sensorily distinct manner so that observants can recognize the enhancements as presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the media content includes volumetric content and the presenting includes displaying the volumetric content within the virtual environment. In this manner, virtual presentations within a virtual environment are enhanced in a sensorily distinct manner relevant for the sense of vision so that observants can visually recognize the enhancements as presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the one or more distinguishing sensory attributes of a volumetric content that is media content as a virtual environment enhancement include one or more of a transparency factor, a scaling factor, a location factor, and a color factor. In this manner, presentation details are included to help a virtual environment participant better visually recognize enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the one or more distinguishing sensory attributes of a volumetric content as a virtual environment enhancement include a transparency factor of the volumetric content that is different from a transparency factor of portions of the virtual environment surrounding the volumetric content. In this manner, presentation details are included to use light transmission qualities of a virtual object to help a virtual environment participant better visually recognize enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the one or more distinguishing sensory attributes of a volumetric content as a virtual environment enhancement include a scaling factor of the volumetric content that is higher or lower than a scaling factor of elements of the virtual environment surrounding the volumetric content. In this manner, size aspects of 3D objects are used to help a virtual environment participant better visually recognize enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the one or more distinguishing sensory attributes of a volumetric content as a virtual environment enhancement include a location factor of the volumetric content such that the volumetric content is displayed within a visually demarcated portion of the virtual environment that separates the volumetric content from the remaining portions of the virtual environment. In this manner, virtual positioning aspects of a 3D object are used to help a virtual environment participant better visually recognize enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the visually demarcated portion of volumetric content as a virtual environment enhancement includes a thought bubble associated with one or more characters within the virtual environment. In this manner, socially understood structure is used for positioning a 3D object to help a virtual environment participant better visually recognize enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the media content that is presented as a virtual environment enhancement changes over time during the presenting. In this manner, advanced enhancements are used to better enhance a presentation occurring within a virtual environment while still helping a virtual environment participant better visually recognize the enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the media content that is presented as a virtual environment enhancement and that includes one or more sensory distinguishing attributes includes audio content. In this manner, alternative sensory content of hearing is used to help a virtual environment participant better recognize the enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, responsive to identifying a second predetermined signal, the media content is revoked. A third signal from the virtual environment is detected and decoded. The media content is evolved based on the third signal and thereby produces first evolved media content. The first evolved media content shares one or more elements with the revoked media content. The first evolved media content is presented within the virtual environment such that the first evolved media content includes one or more distinguishing sensory attributes distinguishing the first evolved media content from remaining portions of the virtual environment. In this manner, versatility of media content generation is achieved to allow a virtual environment presenter increased control over virtual environment enhancements used to enhance a virtual presentation.
In additional embodiments, the presentation of the media content is placed within the virtual environment based on the determined signal. In this manner, signal analysis techniques are implemented to allow a virtual environment presenter increased control over virtual environment enhancements used to enhance a virtual presentation.
In additional embodiments, the media content presented in the virtual environment includes a volumetric content and the presenting includes a display of the volumetric content within the virtual environment. The placement of the presentation includes a location placement of the volumetric content within the virtual environment. In this manner, signal analysis techniques are implemented to allow a virtual environment presenter increased control over virtual environment visible enhancements used to visibly enhance a virtual presentation.
In additional embodiments, the media content as a virtual environment enhancement presented in the virtual environment includes audio content and the presenting includes playing of the audio for the virtual environment. A placement of the presentation includes one or more of a location placement for dissemination of the audio content within the virtual environment, a timing placement for the audio content, and a pitch placement for the audio content. In this manner, alternative sensory content of hearing is used to help a virtual environment participant better recognize the enhancements as mere presentation enhancements instead of continually persisting virtual environment structure.
In additional embodiments, the media content as a virtual environment enhancement is presented in a default location placement within the virtual environment. In this manner, preprogramming is used to compensate for lack of input details regarding presentation of virtual environment enhancements that enhance a virtual presentation within a virtual world.
In additional embodiments, the one or more distinguishing sensory attributes of the presented media content match a message that was presented within the virtual environment and that indicated the one or more distinguishing sensory attributes. In this manner, increased control and understanding of virtual participants within a virtual shared environment are achieved via virtual communication techniques.
In additional embodiments, the one or more events in the virtual environment is one or more of audio, a movement, virtual environment character interaction, and virtual environment metadata. In this manner, data analysis techniques are harnessed to determine when virtual environment enhancement is desired to occur within the virtual environment.
In additional embodiments, a computer system and a computer program product for virtual environment enhancement are provided and are capable of causing a processor to perform the above-described methods to achieve the above-described advantages.
The following described exemplary embodiments provide a method, computer system, and computer program product for using artificial intelligence to enhance shared virtual environments. Society is pivoting towards use of collective shared virtual spaces to perform social interaction for a variety of pursuits related to personal recreation, education, governing, business, etc. Using such virtual spaces reduces travel requirements for in-person interactions. For these virtual environments, technical challenges from the physical world become less of a barrier. In the physical world, media content is traditionally delivered to and presented on special-purpose surfaces such as televisions and mobile displays. Immersive integration with the user is difficult for these traditional media deliveries. Virtual shared environments better achieve immersion for users, but seamless integration of personalized, highly responsive, contextual content has been a past goal. Consumer expectations for a virtual environment to facilitate personalized content, increased immersion, and speedy transmission and/or generation of content demands tremendous effort and high costs to content providers. Distinguishing between genuine objects in a virtual environment from transient enhancement media presentation might inspire malicious actors and impose a security risk to users. The present embodiments use software and artificial intelligence to produce signal-triggered artificial intelligence generation of virtual environment enhancements which have distinguishing sensory attributes that help the virtual shared environment users to distinguish these enhancements from remaining portions of the virtual environment. In at least some embodiments the virtual environment enhancements take the form of a generative volumetric overlay that is infused into the collective shared virtual space. The present embodiments therefore achieve enhancement of the virtual experience with advances in media presentation and security.
The achievement of the present embodiments of virtual environment enhancements which have distinguishing sensory attributes and which may be generative volumetric overlays allow users to share a story that plays out with additional visual and/or sound content that is presented and infused into the collective shared virtual space. Sensors within the virtual environment are able to sense events that occur such as spoken words and/or dialogue and/or virtual movements within the virtual environment. Based on the sensing of certain signals from such events, the present embodiments include the artificial intelligence generation of media content that supplements the trigger events and helps achieve a more robust virtual experience. The virtual environment enhancements such as the generative volumetric overlays facilitate presentation, e.g., projection, of the media content enhancements within the virtual environment, e.g., of the three-dimensional virtual environment. The enhancements are based off the sensed triggering signals such as a story being told by a virtual shared environment participant. The identification of the triggering signals triggers artificial intelligence to generate media content that is based on the triggering events, evolves with the triggering events, and is customizable and distinguishable from the remaining portions of the virtual environment. The present embodiments may help achieve more effective and interactive education, increased efficiency in pitching new ideas, and new opportunities for better capturing the attention of others within the collective shared virtual space.
The virtual environment enhancement as described herein includes one or more distinguishing sensory attributes distinguishing the media content from remaining portions of the virtual environment. The media content includes volumetric visual content and/or audio content. The one or more distinguishing sensory attributes of the volumetric content include one or more of a transparency factor, a scaling factor, a location factor, and a color factor that is distinct from the factors of other elements that appear visually and/or audibly within the virtual shared environment. For example, a volumetric content is generated with a color scheme that is unique and contrasts with the color of surrounding elements within the virtual shared environment. For example, a three-dimensional projection is generated with a neon color while no other element in the vicinity within the virtual environment includes a neon color. The distinct color helps virtual environment participants recognize that the virtual environment enhancement is being presented to supplement the one or more events such as a presentation being provided/story being told instead of being a standard portion of the virtual environment.
For virtual environment enhancement in the first enhanced virtual environment scene 100 shown in
In some embodiments, the virtual sensor 108 inputs the captured words and/or the natural language processing output produced from the captured words into a machine learning model associated with the first virtual shared environment 106. The machine learning model as output produces an indication that the words constitute a trigger signal for producing supplemental media content to supplement the event (the event in this case is the third avatar 110c telling a story about seeing the chair). Based on the monitored information being designated as trigger information, the collected information related to the event is input into artificial intelligence which in response generates and/or yields media content related to the event.
The dotted lines used in
In other embodiments, surrounding portions of the virtual environment all included a higher transparency and the generated enhancement chair is non-transparent so that the non-transparent chair is sensorily distinct.
Due to the increased transparency of the transparent first chair 114, the first and second avatars 102a, 102b better are able to recognize that they should not virtually sit on the transparent first chair 114 but instead that this transparent first chair 114 is being presented within the first virtual shared environment 106 in order to help illustrate the story that is being told by the third avatar 110c. In other embodiments, the enhancements such as the transparent first chair 114 allow virtual interaction with participants within the respective virtual shared environment. For example, in other embodiments if the third avatar 110c is describing physical aspects of the transparent first chair 114 one of the other avatars is able to virtually sit on the transparent first chair 114 in order to better appreciate the description that is being given by the third avatar 110c. The virtual reality equipment used by a user to participate in the virtual environment in some embodiments include one or more tactile sensors which provide feedback to mimic the physical sensation that would be present if the user were physical engaging (e.g., sitting on) such a chair in the physical world.
In some embodiments, a user preconfigures with the virtual environment enhancement program 816 particular text such as one or more wake-up words to trigger voice recording/text monitoring for monitoring of the words provided in the virtual shared environment. The monitoring can find suitable text content for virtual environment enhancement generation with one or more distinguishing sensory attributes. In some embodiments, a user preconfigures with the virtual environment enhancement program 816 particular text such as one or more trigger words whose subsequent detection in the virtual environment triggers generation of particular virtual environment enhancement content with distinguishing sensory attributes. In some embodiments, a user presents various words and stores particular media content for each respective word or set of words. In some embodiments, the virtual environment enhancement program 816 generates a graphical user interface with which a user interacts using a computer such as the client computer 801 shown in
In some embodiments, a word for waking the text monitoring of the virtual environment enhancement program 816 is publicly shared within the virtual environment for various avatars and/or other virtual environment participants to use for triggering the virtual environment enhancement with one or more distinguishing sensory attributes. For example, in some embodiments the wake word is publicly presented within the shared virtual space so that any virtual space participant who desires can use the wake word to trigger the virtual environment enhancement.
In some embodiments, a wake action is associated with particular media content for the virtual environment enhancement. In other embodiments, a wake action triggers the virtual environment enhancement program 816 to begin to monitor the shared content within the virtual shared environment so that contextual clues from the one or more events (such as words shared) within the virtual shared environment are captured and analyzed to determine appropriate virtual environment enhancements to generate and present to supplement and enhance the one or more events, e.g., to illustrate a story that is being verbally told.
For virtual environment enhancement in the second enhanced virtual environment scene 200 shown in
The altered scaling factor of the enlarged chair 214 helps the virtual environment participants recognize that this enhancement is a supplemental content enhancement and not a structured portion of the first virtual shared environment 106. For example, due to the increased size of the enlarged chair 214 the first and second avatars 102a, 102b better are able to recognize that they should not virtually sit on the enlarged chair 214 but instead that this enlarged chair 214 is being presented within the first virtual shared environment 106 in order to help illustrate the story that is being told by the third avatar 110c. In other embodiments, the enhancements such as the enlarged chair 214 allow virtual interaction with participants within the respective virtual shared environment. For example, in other embodiments if the third avatar 110c is describing physical aspects of the enlarged chair 214 one of the other avatars is able to climb up and virtually sit on the enlarged chair 214 in order to better appreciate the verbal description that is being given by the third avatar 110c. The virtual reality equipment used by a user to participate in the virtual environment in some embodiments include one or more tactile sensors which provide feedback to mimic the physical sensation that would be present if the user were physical engaging (e.g., sitting on) such a chair in the physical world.
For virtual environment enhancement in the third enhanced virtual environment scene 300 shown in
In response to the monitored information being designated by the virtual sensors and/or the virtual environment enhancement program 816 as trigger information, the collected information related to the event is input into artificial intelligence which in response generates and/or yields media content related to the event.
The presentation of the chair volumetric content 314 within the designated location, e.g., within the visually demarcated enclosure such as the enclosed thought bubble 316, helps the virtual environment participants recognize that this enhancement is a supplemental content enhancement and not a typical structured portion of the first virtual shared environment 106. For example, due to the positioning within the designated location the first and second avatars 102a, 102b better are able to recognize that they should not virtually sit on the chair volumetric content 314 but instead that this chair volumetric content 314 is being presented within the first virtual shared environment 106 in order to help illustrate the story that is being told by the third avatar 110c. In other embodiments, the enhancements such as the chair volumetric content 314 within the designated location that is visually demarcated (such as the area within the enclosed thought bubble 316) allow virtual interaction with participants within the respective virtual shared environment. For example, in other embodiments if the third avatar 110c is describing physical aspects of the chair depicted with the chair volumetric content 314 one of the other avatars is able to enter the visually demarcated area, e.g., the enclosed thought bubble 316, and virtually sit on the chair volumetric content 314 in order to better appreciate the description that is being given by the third avatar 110c. The virtual reality equipment used by a user to participate in the virtual environment in some embodiments include one or more tactile sensors which provide feedback to mimic the physical sensation that would be present if the user were physical engaging (e.g., sitting on) such a chair in the physical world.
In some embodiments, a user preconfigures with the virtual environment enhancement program 816 a visual action such as the first gesture 312 to trigger event monitoring such as voice recording/text monitoring for monitoring of the words provided in the virtual shared environment. In some embodiments, a user preconfigures with the virtual environment enhancement program 816 a visual action such as a gesture to trigger generation of particular virtual environment enhancement content with distinguishing sensory attributes. In some embodiments, a user presents various visual actions and stores particular media content for each respective visual action. In some embodiments, the virtual environment enhancement program 816 generates a graphical user interface with which a user interacts using a computer such as the client computer 801 shown in
For virtual environment enhancement in the fourth enhanced virtual environment scene 400 shown in
The accompanying audio 416 helps form the distinguishing sensory attribute for the audio-accompanied chair 414 geometric volumetric overlay by being distinct from other sounds that are being presented within the first virtual shared environment 106.
The accompanying audio 416 helps the virtual environment participants recognize that this enhancement is a supplemental content enhancement and not a structured portion of the first virtual shared environment 106. For example, due to the accompanying audio 416 the first and second avatars 110a, 110b better are able to recognize that they should not virtually sit on the audio-accompanied chair 414 but instead that this audio-accompanied chair 414 is being presented within the first virtual shared environment 106 in order to help illustrate the story that is being told by the third avatar 110c. In other embodiments, the enhancements such as the audio-accompanied chair 414 allow virtual interaction with participants within the respective virtual shared environment. For example, in other embodiments if the third avatar 110c is describing physical aspects of the audio-accompanied chair 414 one of the other avatars is able to virtually sit on the audio-accompanied chair 414 in order to better appreciate the description that is being given by the third avatar 110c. The virtual reality equipment used by a user to participate in the virtual environment in some embodiments include one or more tactile sensors which provide feedback to mimic the physical sensation that would be present if the user were physical engaging (e.g., sitting on) such a chair in the physical world.
For virtual environment enhancement in the fifth enhanced virtual environment scene 500 shown in
The “sole” of the sole audio content 514 refers to this embodiment not including a generative volumetric overlay to accompany the sole audio content 514. After being generated via the artificial intelligence media content generator, the sole audio content 514 is modified via the sensory distinguishing module to further include a distinguishing audio sensory attribute to distinguish this audio from other audio within the first virtual shared environment 106. A distinguishing sound 515 such as one or more of a beep, chirp, clap, snap, chime, buzz, etc. that occurs preceding, following, and/or intermittently throughout a contextual sound enhancement such as the sole audio content 514. The distinguishing sound 515 may occur intermittently and/or at a beginning of the enhancement presentation before a contextual related audio clip is played. For example, a beep (which may be the distinguishing sound 515) is played initially and then contextual audio such as the actual song (which may be the sole audio content 514) which the third avatar 110c is describing plays after the initial enhancement commencement sound that is the distinguishing sound 515.
In the fifth virtual scene 500 depicted in
In some embodiments, the distinguishing sound 515 is presented with a timing placement selected by the virtual environment enhancement program 816 so that interference of the distinguishing sound 515 with the sole audio content 514 and/or words spoken by the storyteller/virtual world presenter, e.g., the third avatar 110c, is reduced. The virtual environment enhancement program 816 can generate an audio sequence of the distinguishing sound 515 and sole audio content 514 combination so that the two do not overlap. In some embodiments, the virtual environment enhancement program 816 also generates the distinguishing sound 515 to be at a higher or lower pitch and/or octave than the sole audio content 514 and/or the storyteller voice so that the distinguishing sound 515 is more distinct compared to one or both of the other two.
To enable the virtual environment enhancement depicted in the other drawings, in some embodiments the respective virtual shared environment includes presentation of a message within the respective virtual shared environment that notifies the virtual participants of a particular enhancement sensory attribute to be used.
A computer system with the virtual environment enhancement program 816 operates as a special purpose computer system in which the virtual environment enhancement program 816 assists in improving the immersive content experience of a virtual environment. In particular, the virtual environment enhancement program 816 transforms a computer system into a special purpose computer system as compared to currently available general computer systems that do not have the virtual environment enhancement program.
It should be appreciated that
In a step 702 of the virtual environment enhancement process 700, a virtual environment is monitored. In at least some embodiments, step 702 is performed via one or more virtual sensors such as the virtual sensor 108 that was shown in
In a step 704 of the virtual environment enhancement process 700, a signal is determined that triggers virtual environment enhancement. The signal of step 704 is obtained from the virtual environment that is being monitored in step 702. In some embodiments, step 704 includes performing natural language processing on text such as spoken or displayed words that are presented within the virtual environment. In some embodiments, speech-to-text transcription is performed on audio words that are captured from the determined signals within the virtual environment. Such speech-to-text transcription uses linguistic algorithms to sort auditory signals and convert the audio signals into text such as Unicode text. Other natural language processing is then performed on the produced text. Various captured signals such as text, images, audio, and/or virtual environment metadata are input into one or more machine learning models to identify any signals for triggering the virtual environment enhancement. In some embodiments, the audio and/or images that are recorded represent virtual environment character interaction of virtual environment characters within the virtual environment. For example, the virtual environment enhancement program recognizes multiple distinct voices indicating that a conversation is taking place in order to proceed further with the virtual environment enhancement and generate and present the virtual environment enhancement. Such confirmation of a conversation might be used by the virtual environment enhancement program 816, because the virtual environment enhancement program 816 might make a resource preserving choice to not generate the virtual environment enhancement if no other person/avatar is in the virtual vicinity to hear the story of the avatar. In some embodiments similar to the audio confirmation of a conversation amongst multiple parties, an image confirmation of other avatars being physically present within the virtual vicinity (e.g., within a pre-determined distance threshold) is used as a confirmation to proceed with virtual environment enhancement. In some embodiments the one or more machine learning models are trained in a supervised manner by having various input data (text, images, metadata, etc.) and labels of “signal” indicators that accompany certain input data. The identification of a trigger signal in step 704 causes the virtual environment enhancement process 700 to proceed to step 706 to evaluate the signal and/or to evaluate a request that is based on the determined signal. In a simpler embodiment, words are monitored and compared to a look-up table and a match in the look-up table to a word designated as a trigger word achieves the triggering of step 704.
In a step 706 of the virtual environment enhancement process 700, a request that is based on the determined signal is provided to an artificial intelligence media content generator. The determined signal refers to the signal determined in step 704. In some embodiments, the determined signal itself is input as the request into the artificial intelligence media content generator. In other embodiments, the identified signal points to a content creation request which represents the desired content and is provided to the artificial intelligence media content generator. In at least some embodiments, the virtual environment enhancement program 816 performs semantic analysis NLP on the signals that are received in order to produce a request for specific content that semantically matches the signals. For example, the virtual environment enhancement program 816 performs semantic word vector analysis, e.g., with cosine similarity comparison, on vectors for words received in the signal to determine appropriate specific media content that should be generated to enhance the event.
In at least some embodiments the artificial intelligence media content generator is part of or accessible to the virtual environment enhancement program 816 and produces media content for virtual environment enhancements based on input data such as input text. The virtual environment enhancements in at least some embodiments include generative volumetric overlays that appear with three-dimensions when stitched into the virtual environment. In some embodiments, the virtual environment enhancements include audio components. The artificial intelligence media content generator is trained to use multiple different image views of an object to stitch together a three-dimensional representation of the object. In some embodiments, the artificial intelligence media content generator accesses large-scale repositories of three-dimensional CAD models to produce the generative volumetric overlays. In some embodiments, the artificial intelligence media content generator is trained by scoring random image views of objects with frozen pretrained image and text encoders trained on web images and alt-text. In some embodiments, the artificial intelligence media content generator implements geometric priors including sparsity-inducing transmittance regularization scene bounds and multilayer perceptron architectures. In at least some embodiments, the artificial intelligence media content generator implements point clouds, voxel grids, triangle meshes, generative adversarial networks, neural rendering, delayed neural rendering, feature extraction, image landmarking, and/or image reconstruction to produce three-dimensional visual content and texture.
In some embodiments, a hidden-layer diffusion model is used that is conditioned on a multi-category shape vector to produce 3D volumetric presentations from 2D image inputs. In some embodiments, a diffusion and denoising process in a pixel space is transformed into operations in a neural radiance field parameter space in which an entire volume space is represented with a continuous function parameterized by a multilayer perceptron. In some embodiments, the artificial intelligence media content generator uses a text-to-3D algorithm to generate a generative volumetric overlay in response to receiving words and based on words that were captured from the virtual shared environment. In some embodiments, a control shape of a limit subdivided surface is obtained along with a texture map and a normal map, optimization on mesh parameters is performed directly, and these elements are used to produce a 3D volumetric presentation with plausible meshes and textures starting from a text embedding. In some embodiments, the artificial intelligence media content generator is divided into a text-to-multiple views generation module and a multiple views-to-3D model generation module. In some embodiments, the artificial intelligence media content generator uses a dynamic neural radiance field which is optimized for scene appearance, motion consistency, and density using a model trained on text-image pairs.
In some embodiments, the virtual environment enhancement program 816 performs web-scraping to obtain images that correspond to certain text and those images are input into the artificial intelligence media content generator to produce the generative volumetric overlays. For example, the virtual environment enhancement program 816 recognizes a story being told within the virtual shared environment about a chair, analyzes the words of the story to identify details about the chair, and uses the details identified to find pictures of the chair from the internet. The so-obtained pictures/images are then used to produce the volumetric media content to project for visual observation within the three-dimensional virtual environment.
In some embodiments, the artificial intelligence media content generator is trained for customization with respect to particular users. In some embodiments, the artificial intelligence media content generator is trained with images of the acquaintances of the user. These images are accessed to generate generative volumetric overlays depicting the acquaintances. Images of individuals are used subject to obtaining appropriate consent according to governing privacy laws. Such generative volumetric overlays depicting particular people are produced in some embodiments to illustrate stories being told about these people. When the monitored signals are provided, received, and determined that they relate to a story about acquaintances of a virtual storyteller, in response a virtual environment enhancement showing images and/or voices of these acquaintances are produced. These generative volumetric overlays of people constitute 3D actors that act out a story being told within the virtual environment. Some embodiments also include (subject to legally required consent being obtained) capturing and storing voices of the acquaintances for use with/as the virtual environment enhancements. In some instances, as the story evolves to indicate different actions being performed by the people the generative volumetric overlay is updated to match the new different actions that are being explained in the story. In some instances, a user customizes the artificial intelligence media content generator by providing, e.g., uploading, one or more digital images of an object which the user would like to use to produce a generative volumetric overlay to be displayed within the virtual environment as presented herein.
In a step 708 of the virtual environment enhancement process 700, one or more virtual environment enhancements are received as output from the artificial intelligence media content generator. The artificial intelligence media content generator refers to that component to which the request was provided in step 706. Examples of the virtual environment enhancements were the basic chairs and/or accompanying audio provided above in describing the embodiments shown in
In a step 710 of the virtual environment enhancement process 700, the determined signal is analyzed for placement instructions. The determined signal refers to the signal that was determined in step 704. In some embodiments, step 710 includes inputting the determined signal into another machine learning model that is trained to identify placement instructions within the monitored data. In some embodiments this machine learning models is trained in a supervised manner by having various input data (text, images, metadata, etc.) and labels of “placement instruction” indicators that accompany certain input data. For example, an avatar speaking within the virtual environment says that the enhancement should be positioned within a thought bubble for this avatar. The virtual environment enhancement program 816 recognizes that provided instruction and in response generates the virtual environment enhancement within the thought bubble. In another example, in one instance an avatar shares a story about a bird or a flying experience and the virtual environment enhancement program 816 recognizes a schematic environmental element of the story as being related to the air or sky and in response generates the virtual environment enhancement to be presented above the storyteller (so as to appear as occurring within the higher air or sky) within the virtual environment. The machine learning model recognizes this word instruction as a placement instruction. The virtual environment enhancement program 816 receives data from the software program hosting the virtual environment in order to perform step 710. In some embodiments this data analyzed for step 710 is the same data that is analyzed as part of steps 704 and/or 706 to identify a trigger signal and to identify a content creation request for the artificial intelligence media content generator.
In a step 712 of the virtual environment enhancement process 700, a determination is made as to whether one or more placement instructions is identified in the signal. If the determination of step 712 is negative and no placement instruction is identified within the signal, then the virtual environment enhancement process 700 proceeds to step 714. If the determination of step 712 is affirmative and one or more placement instructions are identified within the signal, then the virtual environment enhancement process 700 proceeds to step 716. In some embodiments, the determination of step 712 may be performed using a machine learning model which receives virtual environment data as input and in response as output gives a determination as to whether placement instructions for the virtual environment enhancement has been provided.
In a step 714 of the virtual environment enhancement process 700, default placement instructions are used. In some embodiments, the default placement instructions are customized according to a virtual environment participant who provided the trigger signal. In a preliminary or registration step, the participant provides in a graphical user interface, generated by the virtual environment program, information about the desired virtual environment traits for the user such as virtual environment enhancement placements. In some embodiments, the default placement instructions are for the user thought bubble or another visually, e.g., linearly, demarcated area within the virtual shared environment which is sized to hold a generative volumetric overlay. In some embodiments, the default placement occurs via the virtual environment enhancement program 816 sensing the position of current objects such as other avatars listening to a story within the virtual environment and selects the placement according to a free position, e.g., a nearest free position, to the speaker/enhancement trigger provider. The free position refers to a position that is not currently being occupied by an avatar or virtual environment visual structure within the virtual environment. In some embodiments, the default placement instructions is further specified by the virtual sensors and the virtual environment enhancement program 816 identifying listeners (e.g., virtual participants) to a story or presentation, identifying the positions of those listeners, and then choosing the enhancement placement for a position which maximizes visibility of the enhancement with respect to those listeners. The default placement instructions are combined with the instructions for generating the virtual environment enhancement so that the combination of these (content plus location) are usable in step 718.
In a step 716 of the virtual environment enhancement process 700, identified placement instructions are used. These placement instructions refer to those identified in steps 710 and 712. The identified placement instructions are combined with the instructions for generating the virtual environment enhancement so that the combination of these (content plus location) are usable in step 718.
In a step 718 of the virtual environment enhancement process 700, the enhancement is presented in the virtual environment so as to be sensorily distinct from other portions of the virtual environment and based on the placement instructions. The enhancement refers to the media content output from the artificial intelligence media content generator that was received in step 708. In at least some embodiments, step 718 is performed via inputting the media content output from step 708 into a sensory distinguishing module which adjusts the media content to imbue the media content with a sensory distinguishing attribute. The sensory distinguishing attribute of the enhancement(s) include one or more of a transparency factor, a scaling factor, a location factor, an accompanying audio factor, an audio factor, and a color factor as compared to other portions of the virtual shared environment. The sensory distinguishing module implements various media content producing techniques to imbue the change in transparency, size, location etc.
In some embodiments, the visual environment enhancement that is presented in the virtual shared environment evolves and changed over time during the presenting and based on updates to the one or more events that are occurring. For example, if a story being told by a virtual environment transitions to a different segment, the virtual environment program 816 continues to monitor the virtual environment content to identify changes for the virtual environment enhancement. For example, in the various embodiments shown with a generative volumetric overlay of a chair presented in the virtual shared environment and the speaker continues to speak about another object and/or person, the virtual environment program 816 generates another virtual environment enhancement to represent the new object and/or person and presents this new virtual environment enhancement in addition to the chair virtual environment enhancement. In some embodiments a user explains about different color and/or material (textile) used for the chair that he saw and the virtual environment program 816 updates the generative volumetric overlay to have the newly mentioned color and/or material for viewing and/or virtual touching by the virtual audience.
In some embodiments, the supplemental enhancement components have the same sensorily distinct attribute that the chair had. For example, for the transparent chair 114 a supplemental image of a person next to the chair (who is part of the story/presentation by the storyteller) also is generated with a transparency factor that the 3D presentation of the person is more transparent than other elements of the virtual shared environment 106. In some embodiments, the supplemental enhancement components also appear within a designated location (e.g., within the visually demarcated area, e.g., the particular thought bubble) shared by the 3D chair presentation. In some embodiments, the supplemental virtual environment enhancement has one or more different sensory distinguishing attributes compared to the chair that are still distinct compared to remaining portions of the virtual shared environment, e.g., the chair is more transparent than the surroundings and a non-transparent 3D presentation of a person next to the chair is generated with some supplemental audio indication (e.g., a chirp, a beep, a narrator voice, etc.). The supplemental audio sound indicates that the person volumetric image is also an enhancement and not a main standard part of the virtual shared environment, e.g., not an actual avatar who is listening and can spontaneously respond using the thought patterns of the person being represented. Thus, in some embodiments the virtual environment media content enhancement evolves over time based on the continuation of the one or more events. In other embodiments, however, the initial virtual environment media content enhancement maintains the same form and does not evolve over time within the virtual shared environment.
In a step 720 of the virtual environment enhancement process 700, a determination is made as to whether the virtual environment continues. If the determination of step 720 is negative and the virtual environment does not continue, then the virtual environment enhancement process 700 ends. If the determination of step 712 is affirmative and the virtual environment continues, then the virtual environment enhancement process 700 proceeds to step 702 for further monitoring of the virtual environment for a possibility to generate further suitable and sensorily distinct virtual environment enhancements.
In some embodiments, the virtual sensor and virtual environment program 816 facilitate revoking of a presented virtual environment enhancement. The virtual sensor senses and identifies another event such as a second predetermined signal which causes the virtual environment program 816 to revoke and remove the previously presented media content that was the virtual environment enhancement. Thereafter, an additional, e.g., a third, signal is received from the virtual environment e.g., via a further word provided and/or movement, e.g., gesture made. The third signal is detected and decoded via the virtual environment enhancement program 816 and a new virtual environment enhancement is generated via the artificial intelligence media content generator and based on the new third signal that is received. The new virtual environment enhancement that is media content shares one or more elements with the media content of the previous virtual environment enhancement that was revoked. Thus, the new virtual environment enhancement constitutes an evolution of the virtual environment enhancement. In at least some embodiments the first evolved media content is further supplemented, e.g., via a sensory distinguishing module of the virtual environment enhancement program 816 to include one or more distinguishing sensory attributes distinguishing the evolved media content from remaining portions of the virtual environment (see exemplary distinguishing sensory attributes described for the main enhancement throughout this disclosure). In some embodiments, for consistency the evolved media content maintains the one or more distinguishing sensory attributes that the original virtual environment enhancement had. In some embodiments, the evolved media content includes at least one different distinguishing sensory attribute as compared to the one or more distinguishing sensory attributes that the original virtual environment enhancement had.
In various embodiments the one or more machine learning models involved in the virtual environment enhancement process 700 include one or more of naive Bayes models, random decision tree models, linear statistical query models, logistic regression n models, neural network models, e.g. convolutional neural networks, multi-layer perceptrons, residual networks, long short-term memory architectures, algorithms, deep learning models, deep learning generative models, and other machine learning models. Training data includes samples of trigger signals, placement instructions, and specific content creation request instructions. The learning algorithm, which is trained in the machine learning models in question, finds patterns in input data about the samples in order to map the input data attributes to the target. The trained machine learning models contain or otherwise utilize these patterns so that the recommendations and recognition can be predicted for similar future inputs. A machine learning model may be used to obtain predictions on new trigger signals, placement instructions, enhancement type instructions, and instructions to create specific content for the virtual environment. The machine learning model uses the patterns that are identified to determine what the appropriate recognition and generation decisions are for future data to be received and analyzed. As samples are being provided, training of the one or more machine learning models may include supervised learning by submitting prior data sets to an untrained or previously trained machine learning model. In some instances, unsupervised and/or semi-supervised learning for the one or more machine learning models may also be implemented.
It may be appreciated that
Various aspects of the present disclosure are described by narrative text, flowcharts. block diagrams of computer systems and/or block diagrams of the machine logic included in computer program product (CPP) embodiments. With respect to any flowcharts, depending upon the technology involved, the operations can be performed in a different order than what is shown in a given flowchart. For example, again depending upon the technology involved, two operations shown in successive flowchart blocks may be performed in reverse order, as a single integrated step, concurrently, or in a manner at least partially overlapping in time.
A computer program product embodiment (“CPP embodiment” or “CPP”) is a term used in the present disclosure to describe any set of one, or more, storage media (also called “mediums”) collectively included in a set of one, or more, storage devices that collectively include machine readable code corresponding to instructions and/or data for performing computer operations specified in a given CPP claim. A “storage device” is any tangible device that can retain and store instructions for use by a computer processor. Without limitation, the computer readable storage medium may be an electronic storage medium, a magnetic storage medium, an optical storage medium, an electromagnetic storage medium, a semiconductor storage medium, a mechanical storage medium, or any suitable combination of the foregoing. Some known types of storage devices that include these mediums include: diskette, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or Flash memory), static random access memory (SRAM), compact disc read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanically encoded device (such as punch cards or pits/lands formed in a major surface of a disc) or any suitable combination of the foregoing. A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media. As will be understood by those of skill in the art, data is typically moved at some occasional points in time during normal operations of a storage device, such as during access, de-fragmentation or garbage collection, but this does not render the storage device as transitory because the data is not transitory while it is stored.
Computing environment 800 shown in
COMPUTER 801 may take the form of a desktop computer, laptop computer, tablet computer, smart phone, smart watch or other wearable computer, mainframe computer, quantum computer or any other form of computer or mobile device now known or to be developed in the future that is capable of running a program, accessing a network or querying a database, such as remote database 830. As is well understood in the art of computer technology, and depending upon the technology, performance of a computer-implemented method may be distributed among multiple computers and/or between multiple locations. On the other hand, in this presentation of computing environment 800, detailed discussion is focused on a single computer, specifically computer 801, to keep the presentation as simple as possible. Computer 801 may be located in a cloud, even though it is not shown in a cloud in
PROCESSOR SET 810 includes one, or more, computer processors of any type now known or to be developed in the future. Processing circuitry 820 may be distributed over multiple packages, for example, multiple, coordinated integrated circuit chips. Processing circuitry 820 may implement multiple processor threads and/or multiple processor cores. Cache 821 is memory that is located in the processor chip package(s) and is typically used for data or code that should be available for rapid access by the threads or cores running on processor set 810. Cache memories are typically organized into multiple levels depending upon relative proximity to the processing circuitry. Alternatively, some, or all. of the cache for the processor set may be located “off chip.” In some computing environments, processor set 810 may be designed for working with qubits and performing quantum computing.
Computer readable program instructions are typically loaded onto computer 801 to cause a series of operational steps to be performed by processor set 810 of computer 801 and thereby effect a computer-implemented method, such that the instructions thus executed will instantiate the methods specified in flowcharts and/or narrative descriptions of computer-implemented methods included in this document (collectively referred to as “the inventive methods”). These computer readable program instructions are stored in various types of computer readable storage media, such as cache 821 and the other storage media discussed below. The program instructions, and associated data, are accessed by processor set 810 to control and direct performance of the inventive methods. In computing environment 800, at least some of the instructions for performing the inventive methods may be stored in virtual environment enhancement program 816 in persistent storage 813.
COMMUNICATION FABRIC 811 is the signal conduction path that allows the various components of computer 801 to communicate with each other. Typically, this fabric is made of switches and electrically conductive paths, such as the switches and electrically conductive paths that make up busses, bridges, physical input/output ports and the like. Other types of signal communication paths may be used, such as fiber optic communication paths and/or wireless communication paths.
VOLATILE MEMORY 812 is any type of volatile memory now known or to be developed in the future. Examples include dynamic type random access memory (RAM) or static type RAM. Typically, volatile memory 812 is characterized by random access, but this is not required unless affirmatively indicated. In computer 801, the volatile memory 812 is located in a single package and is internal to computer 801, but, alternatively or additionally, the volatile memory may be distributed over multiple packages and/or located externally with respect to computer 801.
PERSISTENT STORAGE 813 is any form of non-volatile storage for computers that is now known or to be developed in the future. The non-volatility of this storage means that the stored data is maintained regardless of whether power is being supplied to computer 801 and/or directly to persistent storage 813. Persistent storage 813 may be a read only memory (ROM), but typically at least a portion of the persistent storage allows writing of data, deletion of data and re-writing of data. Some familiar forms of persistent storage include magnetic disks and solid state storage devices. Operating system 822 may take several forms, such as various known proprietary operating systems or open source Portable Operating System Interface-type operating systems that employ a kernel. The code included in virtual environment enhancement program 816 typically includes at least some of the computer code involved in performing the inventive methods.
PERIPHERAL DEVICE SET 814 includes the set of peripheral devices of computer 801. Data communication connections between the peripheral devices and the other components of computer 801 may be implemented in various ways, such as Bluetooth connections, Near-Field Communication (NFC) connections, connections made by cables (such as universal serial bus (USB) type cables), insertion-type connections (for example, secure digital (SD) card), connections made through local area communication networks and even connections made through wide area networks such as the internet. In various embodiments, UI device set 823 may include components such as a display screen, speaker, microphone, wearable devices (such as goggles and smart watches), keyboard, mouse, printer, touchpad, game controllers, haptic devices, and virtual reality devices. Storage 824 is external storage, such as an external hard drive, or insertable storage, such as an SD card. Storage 824 may be persistent and/or volatile. In some embodiments, storage 824 may take the form of a quantum computing storage device for storing data in the form of qubits. In embodiments where computer 801 is required to have a large amount of storage (for example, where computer 801 locally stores and manages a large database) then this storage may be provided by peripheral storage devices designed for storing exceptionally large amounts of data, such as a storage area network (SAN) that is shared by multiple, geographically distributed computers. IoT sensor set 825 is made up of sensors that can be used in Internet of Things applications. For example, one sensor may be a thermometer and another sensor may be a motion detector.
NETWORK MODULE 815 is the collection of computer software, hardware, and firmware that allows computer 801 to communicate with other computers through WAN 802. Network module 815 may include hardware, such as modems or Wi-Fi signal transceivers, software for packetizing and/or de-packetizing data for communication network transmission, and/or web browser software for communicating data over the internet. In some embodiments, network control functions and network forwarding functions of network module 815 are performed on the same physical hardware device. In other embodiments (for example, embodiments that utilize software-defined networking (SDN)), the control functions and the forwarding functions of network module 815 are performed on physically separate devices, such that the control functions manage several different network hardware devices. Computer readable program instructions for performing the inventive methods can typically be downloaded to computer 801 from an external computer or external storage device through a network adapter card or network interface included in network module 815. The network module 815 includes the software, hardware, and firmware necessary for communication with 5G NR signals.
WAN 802 is any wide area network (for example, the internet) capable of communicating computer data over non-local distances by any technology for communicating computer data, now known or to be developed in the future. In some embodiments, the WAN 802 may be replaced and/or supplemented by local area networks (LANs) designed to communicate data between devices located in a local area, such as a Wi-Fi network. The WAN and/or LANs typically include computer hardware such as copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and edge servers. For the micro-cell implemented in the present embodiments, a 5G NR network communication in a micro-cell or micro-MEC is used for the functions of the virtual environment enhancement program 816.
END USER DEVICE (EUD) 803 is any computer system that is used and controlled by an end user (for example, a customer of an enterprise that operates computer 801) and may take any of the forms discussed above in connection with computer 801. EUD 803 typically receives helpful and useful data from the operations of computer 801. For example, in a hypothetical case where computer 801 is designed to provide a natural language processing result to an end user, this result would typically be communicated from network module 815 of computer 801 through WAN 802 to EUD 803. In this way, EUD 803 can display, or otherwise present, the result to an end user. In some embodiments, EUD 803 may be a client device, such as thin client, heavy client, mainframe computer, desktop computer and so on.
REMOTE SERVER 804 is any computer system that serves at least some data and/or functionality to computer 801. Remote server 804 may be controlled and used by the same entity that operates computer 801. Remote server 804 represents the machine(s) that collect and store helpful and useful data for use by other computers, such as computer 801. For example, in a hypothetical case where computer 801 is designed and programmed to provide a recommendation based on historical data, then this historical data may be provided to computer 801 from remote database 830 of remote server 804.
PUBLIC CLOUD 805 is any computer system available for use by multiple entities that provides on-demand availability of computer system resources and/or other computer capabilities, especially data storage (cloud storage) and computing power, without direct active management by the user. Cloud computing typically leverages sharing of resources to achieve coherence and economies of scale. The direct and active management of the computing resources of public cloud 805 is performed by the computer hardware and/or software of cloud orchestration module 841. The computing resources provided by public cloud 805 are typically implemented by virtual computing environments that run on various computers making up the computers of host physical machine set 842, which is the universe of physical computers in and/or available to public cloud 805. The virtual computing environments (VCEs) typically take the form of virtual machines from virtual machine set 843 and/or containers from container set 844. It is understood that these VCEs may be stored as images and may be transferred among and between the various physical machine hosts, either as images or after instantiation of the VCE. Cloud orchestration module 841 manages the transfer and storage of images, deploys new instantiations of VCEs and manages active instantiations of VCE deployments. Gateway 840 is the collection of computer software, hardware, and firmware that allows public cloud 805 to communicate through WAN 802.
Some further explanation of virtualized computing environments (VCEs) will now be provided. VCEs can be stored as “images.” A new active instance of the VCE can be instantiated from the image. Two familiar types of VCEs are virtual machines and containers. A container is a VCE that uses operating-system-level virtualization. This refers to an operating system feature in which the kernel allows the existence of multiple isolated user-space instances, called containers. These isolated user-space instances typically behave as real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can utilize all resources of that computer, such as connected devices, files and folders, network shares, CPU power, and quantifiable hardware capabilities. However, programs running inside a container can only use the contents of the container and devices assigned to the container, a feature which is known as containerization.
PRIVATE CLOUD 806 is similar to public cloud 805, except that the computing resources are only available for use by a single enterprise. While private cloud 806 is depicted as being in communication with WAN 802, in other embodiments a private cloud may be disconnected from the internet entirely and only accessible through a local/private network. A hybrid cloud is a composition of multiple clouds of different types (for example, private, community or public cloud types), often respectively implemented by different vendors. Each of the multiple clouds remains a separate and discrete entity, but the larger hybrid cloud architecture is bound together by standardized or proprietary technology that enables orchestration. management, and/or data/application portability between the multiple constituent clouds. In this embodiment, public cloud 805 and private cloud 806 are both part of a larger hybrid cloud.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” “including,” “has,” “have,” “having,” “with,” and the like, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments. In this regard, each block in the flowchart, pipeline, and/or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).