This relates generally to systems and methods of displaying virtual environments and adding virtual content to the virtual environments.
Proliferation of software and hardware capable of creating digital scenes has enabled users of electronic devices to craft special effects backdrops, computer graphics, and immersive virtual environments for cinema, software applications, interactive exhibits, and the like. Current systems facilitate creation and editing of digital scenes using computing displays, but often require input directed to computing peripherals such as joysticks, computing mice, trackpads, keyboards, and the like. Notes and annotations directed to such digital scenes, however, are spatially divorced from relevant content within the digital scenes. Moreover, entry of such notes and annotations can be inefficient using conventional computing peripherals. It can therefore be appreciated that a system that facilitates annotation of digital assets in a simulated three-dimensional environment, and improves efficiency of user input, can be desired.
Some examples of the disclosure are directed to systems and methods for inserting annotations and facilitating collaboration when viewing an at least partially virtual, three-dimensional scene. In some examples, an electronic device can present and/or display a three-dimensional environment to a user of the three-dimensional environment. In some examples, the three-dimensional environment can include representations of physical objects and/or individuals. In some examples, the three-dimensional environment can include one or more virtual objects and/or virtual assets.
In some examples, the electronic device can display one or more representations of physical individuals. In some examples, the one or more representations include a representation of a user of another electronic device engaged in a communication session with the electronic device. In some examples, the electronic device displays a virtual scene that can be shared with the other electronic device. In some examples, the virtual scene is displayed as though the user of the electronic device is present within a physical equivalent of the virtual scene. In some examples, the representations of physical individuals move relative to the virtual scene.
In some examples, the electronic device displays indications of user attention within the three-dimensional environment. In some examples, the electronic device displays indications of attention corresponding to attention of other users inspecting the virtual scene. In some examples, the electronic device detects an input requesting insertion and display of an annotation into the virtual scene, and in response to the request, displays a representation of the annotation. In some examples, the user input requesting insertion of the annotation includes an air gesture, a voice command, movement of the user's body, and/or gaze of the user directed to virtual content included in the virtual scene. In some examples, the electronic device determines a context of the user's interaction with the virtual scene and/or three-dimensional environment in response to detecting the user input requesting insertion of the annotation. In some examples, the electronic device obtains information such as text, voice recordings, movement of the user's body, and/or movement of the user's attention relative to the virtual scene in response to the user input, and associates the provided information with the inserted annotation.
In some examples, the electronic device determines the context of the user's environment to determine placement and/or orientation of a representation of the annotation. In some examples, the electronic device uses the context to determine information that is stored, and that corresponds to the representation of the annotation. In some examples, the context of the user includes the modality of the user input. In some examples, the context of the user includes parsed speech and/or natural language processing to determine semantics of the user's speech. In some examples, the visual appearance of a representation of an annotation has visual characteristics or properties that indicate an author of the corresponding annotation. In some examples, in response to detecting user input directed to a representation of an annotation, the electronic device presents information such as a recording of audio, display of text, playback of a spatial recording of a representation of an individual, playback of a recording of attention of the individual, and/or some combination thereof.
In some examples, the electronic device captures a virtual screenshot of the virtual scene. In some examples, the virtual screenshot is a virtual object that the user can annotate, similar to or the same as annotations directed to the virtual scene. In some examples, annotations inserted into the virtual screenshot can be inserted into the virtual scene by the electronic device. In some examples, the electronic device can change visual properties or characteristics of the virtual screenshot in response to movement off-angle relative to the virtual screenshot. In some examples, the electronic device can export the screenshot to another device.
In some examples, the electronic device displays a user interface providing an overview of annotations included within a virtual scene. In some examples, the user interface can sort and/or group listings and/or icons corresponding to annotations in accordance with a type of the corresponding annotation, an author that provided the annotation, and/or a chronological order of entry of the annotations. In some examples, in response to detecting user input directed to a representation of an annotation within the user interface, the electronic device can display, visually emphasize, and/or present the annotation information associated with the target of the user input.
The full descriptions of these examples are provided in the Drawings and the Detailed Description, and it is understood that this Summary does not limit the scope of the disclosure in any way.
For improved understanding of the various examples described herein, reference should be made to the Detailed Description below along with the following drawings. Like reference numerals often refer to corresponding parts throughout the drawings.
Some examples of the disclosure are directed to systems and methods for inserting annotations and facilitating collaboration when viewing an at least partially virtual, three-dimensional scene. In some examples, an electronic device can present and/or display a three-dimensional environment to a user of the three-dimensional environment. In some examples, the three-dimensional environment can include representations of physical objects and/or individuals. In some examples, the three-dimensional environment can include one or more virtual objects and/or virtual assets.
In some examples, the electronic device can display one or more representations of physical individuals. In some examples, the one or more representations include a representation of a user of another electronic device engaged in a communication session with the electronic device. In some examples, the electronic device displays a virtual scene that can be shared with the other electronic device. In some examples, the virtual scene is displayed as though the user of the electronic device is present within a physical equivalent of the virtual scene. In some examples, the representations of physical individuals move relative to the virtual scene.
In some examples, the electronic device displays indications of user attention within the three-dimensional environment. In some examples, the electronic device displays indications of attention corresponding to attention of other users inspecting the virtual scene. In some examples, the electronic device detects an input requesting insertion and display of an annotation into the virtual scene, and in response to the request, displays a representation of the annotation. In some examples, the user input requesting insertion of the annotation includes an air gesture, a voice command, movement of the user's body, and/or gaze of the user directed to virtual content included in the virtual scene. In some examples, the electronic device determines a context of the user's interaction with the virtual scene and/or three-dimensional environment in response to detecting the user input requesting insertion of the annotation. In some examples, the electronic device obtains information such as text, voice recordings, movement of the user's body, and/or movement of the user's attention relative to the virtual scene in response to the user input, and associates the provided information with the inserted annotation.
In some examples, the electronic device determines the context of the user's environment to determine placement and/or orientation of a representation of the annotation. In some examples, the electronic device uses the context to determine information that is stored, and that corresponds to the representation of the annotation. In some examples, the context of the user includes the modality of the user input. In some examples, the context of the user includes parsed speech and/or natural language processing to determine semantics of the user's speech. In some examples, the visual appearance of a representation of an annotation has visual characteristics or properties that indicate an author of the corresponding annotation. In some examples, in response to detecting user input directed to a representation of an annotation, the electronic device presents information such as a recording of audio, display of text, playback of a spatial recording of a representation of an individual, playback of a recording of attention of the individual, and/or some combination thereof.
In some examples, the electronic device captures a virtual screenshot of the virtual scene. In some examples, the virtual screenshot is a virtual object that the user can annotate, similar to or the same as annotations directed to the virtual scene. In some examples, annotations inserted into the virtual screenshot can be inserted into the virtual scene by the electronic device. In some examples, the electronic device can change visual properties or characteristics of the virtual screenshot in response to movement off-angle relative to the virtual screenshot. In some examples, the electronic device can export the screenshot to another device.
In some examples, the electronic device displays a user interface providing an overview of annotations included within a virtual scene. In some examples, the user interface can sort and/or group listings and/or icons corresponding to annotations in accordance with a type of the corresponding annotation, an author that provided the annotation, and/or a chronological order of entry of the annotations. In some examples, in response to detecting user input directed to a representation of an annotation within the user interface, the electronic device can display, visually emphasize, and/or present the annotation information associated with the target of the user input. In some examples, the user interface can include selectable options that are selectable to insert and/or interact with the virtual scene.
It should be understood that virtual object 104 is a representative virtual object and one or more different virtual objects (e.g., of various dimensionality such as two-dimensional or other three-dimensional virtual objects) can be included and rendered in a three-dimensional computer-generated environment. For example, the virtual object can represent an application or a user interface displayed in the computer-generated environment. In some examples, the virtual object can represent content corresponding to the application and/or displayed via the user interface in the computer-generated environment. In some examples, the virtual object 104 is optionally configured to be interactive and responsive to user input, such that a user may virtually touch, tap, move, rotate, or otherwise interact with, the virtual object 104. In some examples, the virtual object 104 may be displayed in a three-dimensional computer-generated environment with a particular orientation. For example, the virtual object 104 may be displayed in a tilt locked orientation, a head locked orientation, a body locked orientation, or a world locked orientation in the three-dimensional environment. In some such examples, as described in more detail below, while the virtual object 104 is displayed in the three-dimensional environment, the electronic device selectively moves the virtual object 104 in response to user input (e.g., direct input or indirect input) according to the particular orientation in which the virtual object is displayed. For example, the electronic device selectively moves the virtual object 104 in response to movement of a viewpoint of the user depending on whether the virtual object 104 is body locked, head locked, tilt locked, or world locked. Additionally, it should be understood, that the 3D environment (or 3D virtual object) described herein may be a representation of a 3D environment (or three-dimensional virtual object) projected or presented at an electronic device.
In the discussion that follows, an electronic device that is in communication with a display generation component and one or more input devices is described. It should be understood that the electronic device optionally is in communication with one or more other physical user-interface devices, such as a touch-sensitive surface, a physical keyboard, a mouse, a joystick, a hand tracking device, an eye tracking device, a stylus, etc. Further, as described herein, it should be understood that the described electronic device, display and touch-sensitive surface are optionally distributed amongst two or more devices. Therefore, as used in this disclosure, information displayed on the electronic device or by the electronic device is optionally used to describe information output by the electronic device for display on a separate display device (touch-sensitive or not). Similarly, as referred to herein, input received on the electronic device (e.g., touch input received on a touch-sensitive surface of the electronic device, or touch input received on the surface of a stylus) is optionally used to describe input received on a separate input device that is communicated to and/or indicated to the electronic device.
The device typically supports a variety of applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disk authoring application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an e-mail application, an instant messaging application, a workout support application, a photo management application, a digital camera application, a digital video camera application, a web browsing application, a digital music player application, a television channel browsing application, and/or a digital video player application.
As illustrated in
Communication circuitry 222 optionally includes circuitry for communicating with electronic devices, networks, such as the Internet, intranets, a wired network and/or a wireless network, cellular networks, and wireless local area networks (LANs). Communication circuitry 222 optionally includes circuitry for communicating using near-field communication (NFC) and/or short-range communication, such as Bluetooth®.
Processor(s) 218 include one or more general processors, one or more graphics processors, and/or one or more digital signal processors. In some examples, memory 220 is a non-transitory computer-readable storage medium (e.g., flash memory, random access memory, or other volatile or non-volatile memory or storage) that stores computer-readable instructions configured to be executed by processor(s) 218 to perform the techniques, processes, and/or methods described below. In some examples, memory 220 can include more than one non-transitory computer-readable storage medium. A non-transitory computer-readable storage medium can be any medium (e.g., excluding a signal) that can tangibly contain or store computer-executable instructions for use by or in connection with the instruction execution system, apparatus, or device. In some examples, the storage medium is a transitory computer-readable storage medium. In some examples, the storage medium is a non-transitory computer-readable storage medium. The non-transitory computer-readable storage medium can include, but is not limited to, magnetic, optical, and/or semiconductor storages. Examples of such storage include magnetic disks, optical discs based on CD, DVD, or Blu-ray technologies, as well as persistent solid-state memory such as flash, solid-state drives, and the like.
In some examples, display generation component(s) 214 include a single display (e.g., a liquid-crystal display (LCD), organic light-emitting diode (OLED), or other types of display). In some examples, display generation component(s) 214 includes multiple displays. In some examples, display generation component(s) 214 can include a display with touch capability (e.g., a touch screen), a projector, a holographic projector, a retinal projector, etc. In some examples, electronic device 201 includes touch-sensitive surface(s) 209, respectively, for receiving user inputs, such as tap inputs and swipe inputs or other gestures. In some examples, display generation component(s) 214 and touch-sensitive surface(s) 209 form touch-sensitive display(s) (e.g., a touch screen integrated with electronic device 201 or external to electronic device 201 that is in communication with electronic device 201).
Electronic device 201 optionally includes image sensor(s) 206. Image sensors(s) 206 optionally include one or more visible light image sensors, such as charged coupled device (CCD) sensors, and/or complementary metal-oxide-semiconductor (CMOS) sensors operable to obtain images of physical objects from the real-world environment. Image sensor(s) 206 also optionally include one or more infrared (IR) sensors, such as a passive or an active IR sensor, for detecting infrared light from the real-world environment. For example, an active IR sensor includes an IR emitter for emitting infrared light into the real-world environment. Image sensor(s) 206 also optionally include one or more cameras configured to capture movement of physical objects in the real-world environment. Image sensor(s) 206 also optionally include one or more depth sensors configured to detect the distance of physical objects from electronic device 201. In some examples, information from one or more depth sensors can allow the device to identify and differentiate objects in the real-world environment from other objects in the real-world environment. In some examples, one or more depth sensors can allow the device to determine the texture and/or topography of objects in the real-world environment.
In some examples, electronic device 201 uses CCD sensors, event cameras, and depth sensors in combination to detect the physical environment around electronic device 201. In some examples, image sensor(s) 206 include a first image sensor and a second image sensor. The first image sensor and the second image sensor work in tandem and are optionally configured to capture different information of physical objects in the real-world environment. In some examples, the first image sensor is a visible light image sensor and the second image sensor is a depth sensor. In some examples, electronic device 201 uses image sensor(s) 206 to detect the position and orientation of electronic device 201 and/or display generation component(s) 214 in the real-world environment. For example, electronic device 201 uses image sensor(s) 206 to track the position and orientation of display generation component(s) 214 relative to one or more fixed objects in the real-world environment.
In some examples, electronic device 201 includes microphone(s) 213 or other audio sensors. Electronic device 201 optionally uses microphone(s) 213 to detect sound from the user and/or the real-world environment of the user. In some examples, microphone(s) 213 includes an array of microphones (a plurality of microphones) that optionally operate in tandem, such as to identify ambient noise or to locate the source of sound in space of the real-world environment.
Electronic device 201 includes location sensor(s) 204 for detecting a location of electronic device 201 and/or display generation component(s) 214. For example, location sensor(s) 204 can include a GPS receiver that receives data from one or more satellites and allows electronic device 201 to determine the device's absolute position in the physical world.
Electronic device 201 includes orientation sensor(s) 210 for detecting orientation and/or movement of electronic device 201 and/or display generation component(s) 214. For example, electronic device 201 uses orientation sensor(s) 210 to track changes in the position and/or orientation of electronic device 201 and/or display generation component(s) 214, such as with respect to physical objects in the real-world environment. Orientation sensor(s) 210 optionally include one or more gyroscopes and/or one or more accelerometers.
Electronic device 201 includes hand tracking sensor(s) 202 and/or eye tracking sensor(s) 212, in some examples. Hand tracking sensor(s) 202 are configured to track the position/location of one or more portions of the user's hands, and/or motions of one or more portions of the user's hands with respect to the extended reality environment, relative to the display generation component(s) 214, and/or relative to another defined coordinate system. Eye tracking sensor(s) 212 are configured to track the position and movement of a user's gaze (eyes, face, or head, more generally) with respect to the real-world or extended reality environment and/or relative to the display generation component(s) 214. In some examples, hand tracking sensor(s) 202 and/or eye tracking sensor(s) 212 are implemented together with the display generation component(s) 214. In some examples, the hand tracking sensor(s) 202 and/or eye tracking sensor(s) 212 are implemented separate from the display generation component(s) 214.
In some examples, the hand tracking sensor(s) 202 can use image sensor(s) 206 (e.g., one or more IR cameras, 3D cameras, depth cameras, etc.) that capture three-dimensional information from the real-world including one or more hands (e.g., of a human user). In some examples, the hands can be resolved with sufficient resolution to distinguish fingers and their respective positions. In some examples, one or more image sensor(s) 206 are positioned relative to the user to define a field of view of the image sensor(s) 206 and an interaction space in which finger/hand position, orientation and/or movement captured by the image sensors are used as inputs (e.g., to distinguish from a user's resting hand or other hands of other persons in the real-world environment). Tracking the fingers/hands for input (e.g., gestures, touch, tap, etc.) can be advantageous in that it does not require the user to touch, hold or wear any sort of beacon, sensor, or other marker.
In some examples, eye tracking sensor(s) 212 includes at least one eye tracking camera (e.g., infrared (IR) cameras) and/or illumination sources (e.g., IR light sources, such as LEDs) that emit light towards a user's eyes. The eye tracking cameras may be pointed towards a user's eyes to receive reflected IR light from the light sources directly or indirectly from the eyes. In some examples, both eyes are tracked separately by respective eye tracking cameras and illumination sources, and a focus/gaze can be determined from tracking both eyes. In some examples, one eye (e.g., a dominant eye) is tracked by a respective eye tracking camera/illumination source(s).
Electronic device 201 is not limited to the components and configuration of
Virtual scenes can be used as a digital backdrop when creating cinematic experiences, such as backgrounds for chroma key compositing and/or display on light emitting diode (LED) wall arrays. Virtual scenes can also be included in immersive virtual content, such as immersive virtual scenes for virtual reality (VR), extended reality (XR), and/or mixed reality (MR) applications in which virtual assets consume at least a portion of a view of a viewer's physical environment. Editing virtual scenes, especially when the editing process is collaborative, can be cumbersome and unintuitive using conventional approaches. The present disclosure contemplates methods and systems for improving efficiency of user interaction with virtual assets included in the scene, such as placement of annotations indicating virtual assets that can be added, deleted, edited, or are otherwise of interest to editors of the virtual scene. One or more of the examples of the disclosure are directed to inserting spatial annotations (e.g., annotations having a simulated position relative to the virtual scene, similar to placement of a physical object within a physical environment) drawing user attention toward the annotation, and facilitating communication between individuals that are concurrently reviewing or asynchronously reviewing the virtual scene and/or annotations associated with the virtual scene. In one or more examples, a device facilitating the annotation can communicate with one or more other devices, and users of the respective devices can insert respective annotations into the scene and/or inspect annotations placed by other users of other devices. In one or more examples, the device can determine a context of a user of the device to intelligently place annotations within the virtual scene. In one or more examples, the device can intelligently select between one or more categories of user annotation that will be inserted into the virtual scene.
Attention is now directed towards methods and systems of facilitating annotation of a virtual scene displayed in a three-dimensional environment presented at an electronic device (e.g., corresponding to electronic device 201). As described previously, it can be appreciated that extended reality (XR) editing of virtual scenes improves efficiency of user interaction when editing, annotating, and reviewing annotations related to the virtual scenes. Further, it can be appreciated that engaging a plurality of devices in a communication session to collaboratively edit and/or annotate a virtual scene improves clarity and efficiency of communication between the users of the device. The present disclosure contemplates examples of methods and systems of editing and/or annotating a virtual scene, thus improving human-computer interaction when synthesizing the virtual scene.
For example, an electronic device can display a virtual scene that entirely replaces a view of the physical environment, as though the user were physically within a physical equivalent of the virtual scene. In some examples, display of the virtual scene replacing the view of the user's physical environment can correspond to displaying the virtual scene with a level of immersion greater than a threshold level of immersion, the level(s) of immersion described further herein. In some examples, the virtual scene can only partially replace the view of the physical environment, such that a portion of the user's physical environment remains visible to the user, and/or can be displayed with a level of immersion (e.g., opacity) that is less than the threshold level of immersion. In some examples, in response to detecting a change in the user's viewpoint (e.g., changes to the user's position and/or orientation) in the physical environment, the electronic device can change the perspective view of the virtual scene, as though the user were changing positions within the virtual scene. In some examples, a user input such as a gaze, a gesture, or a hand movement can be detected along with a context of that user input, and a corresponding virtual annotation such as a textual note, a voice recording, a simulated marking, and/or relevant media can be added to the virtual scene.
It can be appreciated that the placement of annotations in the virtual scene and/or the appearance of those annotations can be determined in accordance with characteristics of the user input. For example, in response to detecting an air gesture performed by a hand, the electronic device can display a positional indicator pointing toward and/or otherwise emphasizing a portion of the virtual scene. In another example, in response to detecting user input, the electronic device can determine a context between the user and/or their input and the three-dimensional environment. For instance, in response to detecting speech of the user, the electronic device can determine a likely target (e.g., a particular virtual object, a portion of the virtual scene, and/or some combination thereof) in view of the remarks included in the speech using one or more natural language processing methods (e.g., algorithms, machine learning techniques, and/or predefined behaviors dictating annotation placement in the absence of a likely target), and display an annotation at a position corresponding to the likely target.
The discussion of examples that follow initially will focus on inputs, “user context,” and/or operations performed from the perspective of a user of an electronic device in accordance with examples illustrated in
Turning back to
In some examples, the virtual scene is displayed relative to the user's physical environment. The physical environment—visible outside of a housing of electronic device 101—can include a physical room that the user 318 occupies. In some examples, the virtual scene can be displayed within display 120 at least partially replacing a view of a representation of the user's physical environment, thus “consuming” a view of the physical environment. For example, electronic device 101 can include one or more outward facing cameras that obtain images of the user's physical environment, and the images can be displayed via display 120 as if the user were able to view the physical environment directly, without the assistance of electronic device 101. At least a portion or all of such a view of the physical environment can be displayed at corresponding positions of display 120 and with a level of opacity less than a threshold level (e.g., 0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, or 50% opacity), and the virtual scene can be displayed at those corresponding positions with a level of opacity greater than a threshold level of opacity (e.g., 0, 1, 5, 15, 25, 40, 50, 60, 65, 75, 90, or 100% opacity). In some examples, the physical environment can include one or more physical objects in the user's environment, physical individuals in the user's environment, physical walls, a physical floor, and the like. In some examples, representations of the user's physical environment can be displayed. For example, the virtual screen included in three-dimensional environment 300 in
As described previously, the virtual scene can include one or more virtual objects in some examples of the disclosure. The virtual objects can include digital assets modeling physical objects, virtual placeholder objects (e.g., polygons, prisms, and/or simulated two or three-dimensional shapes), virtual objects including user interfaces for applications (e.g., stored in memory by electronic device 101), and/or other virtual objects that can be displayed within a VR, XR, and/or MR environment. As an example, three-dimensional environment 300 includes barrel 316, which optionally is a virtual asset displayed within the virtual scene at a simulated position similar to a physical position and orientation of a physical barrel relative to a viewpoint of user 318. Similarly, crate 312 is included in three-dimensional environment, at a different simulated position and/or orientation than the position of barrel 316. In
In some examples, electronic device 101 displays one or more representations of individuals other than user 318 via display 120. For example, representations 302a and 302b are included in three-dimensional environment 300 in
In some examples, the representations 302a and/or 302b can be individuals that are not in the user's physical environment but are represented using spatial information. In some examples, electronic device 101 uses the spatial information to map portions of the physical environment of user 318 to portions of the virtual scene, and/or to map portions of the physical environments of representations 302a and/or 302b to the portions of the virtual scene. As an example, a communication session between electronic device 101, a first computer system used by representation 302a, and a second computer system used by representation 302b can be ongoing to facilitate the mapping between physical environments of respective users of respective computer systems. In some examples, the communication session includes communication of information corresponding to real-time, or nearly real-time communication of sounds detected by the computer systems (e.g., speech, sounds made by users, and/or ambient sounds). In some examples, the communication session includes communication of information corresponding to real-time, or nearly real-time movement and/or requests for movement of representations (e.g., avatars) corresponding to users participating in the communication session.
For example, the first computer system can detect movement of a user corresponding to representation 302a in the physical environment of the user (e.g., different from the physical environment of user 318) and can communicate information indicative of that movement with the communication session. Prior to the movement, the first computer system can display the virtual scene relative to a viewpoint of the first computer system (e.g., a position and/or orientation relative to the virtual scene, similar to a physical position and/or orientation of the user relative to a physical equivalent of the virtual scene). In response to the movement (e.g., obtaining information indicative of the movement from the other computer system), the first computer system can update the viewpoint of the user of the first computer system in accordance with the physical movement (e.g., in a direction, and/or by a magnitude of movement) to an updated viewpoint, as though the user of the first computer system were physically moving through a physical equivalent of the virtual scene. It can be appreciated that requests for such movement can be directed to an input device (e.g., a virtual joystick, a trackpad, a physical joystick, a virtual button, a physical button, and/or another suitable control) in addition to or in the alternative to detecting physical movement of the user. Electronic device 101 can receive such information, and in response can move the representation 302a relative to the virtual scene by a magnitude and/or direction of movement that mimics the physical movement of the user of the first computer system relative to a physical equivalent of the virtual scene. It is understood that other computer systems-such as a computer system corresponding to representation 302b and/or electronic device 101—can also detect similar inputs described with reference to the first computer system, and cause movement of their corresponding representation within the virtual scene.
In some examples, a device in communication with electronic device 101 can cause display of representations of users, a virtual scene, and/or virtual annotations. For example, electronic device 101 can display a virtual annotation received from a computer system other than electronic device 101, such as a desktop or laptop computer. In some examples, the computer system can display a view of the virtual scene, such as a representation of the virtual scene on a planar display. In some examples, the computer system can detect input requesting entry of an annotation directed toward the virtual scene, and can communicate an indication of the input to electronic device 101. In response to receiving the input, electronic device 101 can display a virtual annotation within three-dimensional environment 300 and/or within the virtual scene at a location within the virtual scene that corresponds to (e.g., is the same as) the location indicated in the input. Thus, electronic device 101 and the computer system can synchronize annotations within the virtual scene. Both devices can place annotations in the virtual scene, and/or can display indications of received annotations based on indications received from the other device. It is understood that some or all of the operations described herein with reference to electronic device 101 detecting input for, displaying representations of, and/or recording content for virtual annotations can be performed at the computer system, and that electronic device 101 can synchronize placement of the virtual annotations provided by the computer system with annotations inserted locally to electronic device 101 (e.g., in real-time, or nearly real-time).
It is also understood that movement and/or placement of representations of users participating in the communication session can be defined relative to a shared coordinate system, rather than strictly relative to virtual dimensions of the virtual scene. For example, the electronic device 101 can present a view of the physical environment of user 318 not including a virtual scene, and can display representations at positions within the view of the physical environment and/or movement of the representations within the view of the physical environment. It is understood that the examples described with respect to
In some examples, electronic device 101 displays one or more visual indications indicating user attention within three-dimensional environment 300. For example, electronic device 101 detects a virtual position of a target of the user's attention 306a (e.g., gaze), and displays a visual indication 304c at the virtual position, thus presenting a visual indication of the portion of three-dimensional environment 300 that the user's attention is directed to. In some examples, the target of the user's attention is indicated using one or more portions of the user's body other than the eyes. For example, although not shown in
In some examples, electronic device 101 can display indications of attention of the other users. For example, in
In some examples, electronic device 101 displays one or more visual indications of attention illustrating an aggregation of one or more targets of the attention over time. For example, electronic device 101 can display a heatmap indicative of a location of the attention over a period of time (e.g., 1, 3, 5, 10, 15, 30, or 60 seconds, and/or over a period of time that a user is speaking and/or pointing finger(s) toward three-dimensional environment 300). Electronic device 101 can display the heatmap which can encode duration of the attention using a gradient of colors and/or levels of saturation. For example, a dark red portion of the heatmap overlaying the three-dimensional environment 300 can indicate that attention was directed to the corresponding first portion of the three-dimensional environment 300. A dark blue portion of the heatmap can indicate that attention was directed to a corresponding second portion of three-dimensional environment 300 for relatively less time than was directed to the first portion of three-dimensional environment 300. In some examples, intermediate shades of light blue, light red, and additional or alternative colors can indicate that attention was directed to corresponding one or more portions of three-dimensional environment 300 for respective periods of time greater than the dark blue portion and/or less than the dark red portion. It can be appreciated that electronic device 101 can implement the heatmap using a range of saturation, opacity, fill patterns, other colors, and/or some combination thereof to indicate duration of attention toward portions of three-dimensional environment 300.
In some examples, electronic device 101 selectively displays the visual indication of attention (e.g., visual indication 304c in
As an additional example, in
In some examples, electronic device 101 detects one or more user inputs associated with requesting display of virtual annotations. In some examples, the one or more user inputs optionally include an air pinching of fingers of hand 308 as shown in
In some examples, the one or more user inputs include attention of the user. For example, electronic device 101 in
In some examples, the one or more user inputs include speech of the user. For example, in
In some examples, electronic device 101 determines a context of the user's interaction with three-dimensional environment 300. For example, electronic device 101 determines the user's context—partially or entirely—based upon the position and/or orientation of visual indication 304c when an input is detected. For example, because the user's attention is directed to a position on the virtual floor in
In some examples, either in response to, concurrently occurring with, and/or after insertion of a representation of an annotation into the virtual scene, electronic device 101 detects and/or prompts user 318 for information corresponding to the annotation. For example, electronic device 101 displays a user interface prompting the user to provide speech, air gesture(s), text entry (e.g., via a virtual or physical keyboard), movement, attention, and/or other suitable modalities of information. Such a user interface can include one or more virtual buttons to initiate text entry, recordings of voice, recordings of movement, and/or recordings of the user's attention, and/or to cease such text entry and/or recordings. After text entry and/or recordings provided by the user 318 are complete, electronic device 101 can cease display of the user interface and/or associate the provided information with a corresponding representation of an annotation. In some examples, electronic device 101 begins recording and/or initiates text entry without display of a dedicated user interface in response to insertion of the representation of the annotation into the virtual scene.
In some examples, after inserting a virtual annotation into the virtual scene, the virtual annotation will be associated with the virtual scene and can be viewed again independently of an ongoing communication session. For example, one or more users can exit the virtual scene, and when again displaying the virtual scene, an electronic device presenting the virtual scene can display annotations inserted during a previous communication session, with or without communication with additional electronic devices.
In some examples, electronic device 101 can map user speech to virtual objects in accordance with a determination that the user speech describes an object that is similar to a virtual object. For example, speech referring to a box, a rectangular prism, a cuboid, a container, a basket (e.g., if the virtual object includes an opening on at least one side of crate 312), and the like can be determined to correspond to crate 312. Additionally, speech referencing a name assigned to crate 312 can be detected (e.g., “crate 1”) and determined to correspond to crate 312. In such example(s), electronic device 101 can interpret the pronoun “this” object as a virtual object that the user 318 directed their attention to within a threshold amount of time (e.g., 0, 0.01, 0.05, 0.1, 0.5, 1, 1.5, 2, 3, 5, 10, or 30 seconds), a physical object that the user physically gestured toward (e.g., pointing at, moving their fingers and/or hands toward, moving their lips toward the virtual object, leaning their head toward the virtual object, moving their arm toward the virtual object, and/or pointing their leg and/or foot toward the virtual object), and/or a virtual object that is within a threshold distance (e.g., 0, 0.01, 0.05, 0.1, 0.5, 1, 1.5, or 3 m) of the user. Similarly, speech referencing a cask, cylinder, drum, barrel, tub, and/or keg can be determined to correspond to barrel 316 in
In some examples, electronic device 101 can determine user context in accordance with movement, indications of attention, and/or other factors. For example, in
In some examples, user context can be used to determine a likely target between a plurality of potential targets. For example, user speech indicating “this” building can be detected, and electronic device 101 can be used to determine whether the speech corresponds to building 310a or building 310b in accordance with a determined user context (e.g., described with reference to crate 312). As an example, when such speech is detected, user attention is or was recently directed to building 310a. In response to detecting such speech, electronic device 101 can display a virtual annotation pointing toward building 310a, rather than pointing toward building 310b. Similarly, speech referencing “the saloon” can be detected by electronic device 101, and determined to correspond to building 310b due to a name assigned to building 310b and/or text included in building 310b (“e.g., “SALOON”).
In some examples, user context can be determined to correspond to a plurality of likely targets, and electronic device 101 can display an annotation directed toward the plurality of targets. For example, electronic device 101 can detect speech directed to “these” or “those” buildings while user 318 has a viewpoint as illustrated in
In some examples, one or more factors indicative of user context (e.g., speech, air gesture(s), air pose(s), gaze, previous interactions with the three-dimensional environment 300, and the like) can be used in combination to determine a likely set of one or more virtual objects that are a likely target of annotation. The electronic device 101, for example, can use gaze in conjunction with speech, can disregard one or more factors associated with user context (e.g., an air gesture overriding gaze and/or speech), and/or can probabilistically determine user context based on placement (e.g., display) of previous annotations in view of previous determinations of user context.
In some examples, user context can be determined to be generic, and not expressly referencing a virtual object within the three-dimensional environment. For example,
In some examples, user context is determined to be generic and/or directional, such as “to my left,” “to my right,” “in front of me,” and/or in a simulated cardinal direction as specified by user speech. In such examples, electronic device 101 can display a virtual annotation in accordance with a determination of a meaning of the user's speech. For example, electronic device 101 can parse the user's speech, determine a relative portion of three-dimensional environment 300 that the speech can refer to relative to a viewpoint of user 318 when the speech is received, and display and/or place an annotation a predetermined distance from the viewpoint of user 318 toward the relative portion of the three-dimensional environment in response to detecting the speech. For example, discussion of the virtual scene linked to the user's left optionally is mapped to portions of the three-dimensional environment 300 to a left of a center of the user's viewpoint (e.g., including barrel 316 and crate 312 in
In some examples, electronic device 101 displays and/or places virtual annotations within the virtual scene based upon the determined user context. For example, in
In some examples, the virtual annotation is displayed in accordance with a target of the user's gaze. For example, in
In
In some examples, a virtual annotation is displayed at a position in accordance with user context, such as speech. For example, in
In some examples, electronic device 101 determines a position and/or an orientation of the virtual annotations in accordance with a determination of the user's context and/or in accordance with virtual content included the virtual scene. For example, annotations 320a, 320b, 320c, and 320d are displayed pointing downwards toward a virtual object and/or portion of the virtual scene, which optionally can be a default orientation of such annotations. Additionally or alternatively, the annotations can point downward toward the virtual object to further indicate that a particular aspect of the virtual scene is of interest, and/or a particular position on a surface (e.g., virtual floor) included in the virtual scene. Annotation 320f in
Additionally or alternatively, the annotation can lack a precise indication such as an arrow pointing toward a particular position within the virtual scene. In such examples, the annotation can be displayed overlaying, surrounding, and/or extending from a virtual object of interest, the orientation of which can be modified to precisely draw attention toward the portion of the virtual object and/or portion of the virtual scene of interest. For example, in response to detecting speech directed toward a wall included in building 310a, electronic device 101 can display a simulated glowing and/or lighting effect illuminating the wall, extending along one or more dimensions of the wall. Additionally or alternatively, a halo and/or a rounded annotation can be displayed surrounding a base of a virtual object of interest, and/or hovering above a virtual object of interest. It is understood that the spatial profile, visual appearance, position and/or orientation of a representation of an annotation can be the same or different from as described herein, without departing from the scope of the disclosure.
In some examples, a representation of an annotation is displayed with one or more visual characteristics indicating an author of the annotation. For example, in
In some examples, electronic device 101 presents information associated with an annotation in response to detecting user input. In
In some examples, electronic device 101 initiates playback of a recording in response to user input directed toward an annotation. For example, electronic device 101 plays audio 326 in
In
In some examples, the appearance of the representation 328 is different from an appearance of representation 302a. For example, the representation 328 can be displayed with a different set of one or more visual characteristics, and/or can be displayed with an avatar having a different spatial profile. As an example, representation 328 can be a wireframe model, animated to convey movement of line-shaped limbs and polygonal appendages, rather than a realistic avatar illustrating details of the user of representation 302a (e.g., facial features, skin tone, musculature, and other characteristics of the user). In some examples, representation 328 has one or more characteristics that are similar or the same as the representation 302a.
Additionally or alternatively, the spatial recording can include display of an animation of the attention of a representation of a user. Attention 306b, for example, can include gaze of the user directed toward annotation 320d, an air gesture (e.g., air pinch) directed toward annotation 320d, a voice command directed toward annotation 320d, and/or another factor determining the target and/or context of the user corresponding to representation 328.
In some examples, electronic device 101 can capture a virtual screenshot of the virtual scene. In
In
In some examples, electronic device 101 facilitates annotation of the screenshot 334. For example, from
In some examples, electronic device 101 can export the virtual screenshot. For example, the virtual screenshot 334 can be communicated to another electronic device engaged in a communication with electronic device 101, and/or to another device that is similar to or different from electronic device 101. For example, a planar screenshot can be communicated and displayed via a planar display, such as a touchscreen included in a mobile phone, a laptop computer, and/or a head-mounted device capable of displaying immersive virtual content within an at least partially virtual three-dimensional environment.
In some examples, a level of visual prominence of screenshot 334 can change in response to detecting changes in the viewpoint of user 318. For example, from
In some examples, electronic device 101 displays a user interface directed to viewing and interacting with an overview of annotations associated with a virtual scene. For example, in
In some examples, in response to selection of a particular representation and/or a plurality of representations, electronic device 101 ceases display of user interface 348 and/or visually emphasizes a corresponding representation of an annotation within the virtual scene, such as annotation 320g in
In some examples, a representation of an annotation included in user interface 348 can have one or more visual indications (e.g., icons, visual characteristic(s), fonts, and the like) included in user interface 348 to convey an author of a corresponding annotation, similar to as described with reference to the other virtual annotations herein.
In
In
In some examples, the overhead view of the three-dimensional environment and/or of the virtual scene is displayed by electronic device 101. For example, the glyph illustrating three-dimensional environment 300 can be displayed by electronic device 101 in response to detecting a suitable user input requesting display of a user interface and/or virtual object including the glyph. In some examples, the virtual object including the glyph can be displayed, in a manner similar or the same as described with reference to the virtual screenshots herein. In some examples, electronic device 101 can detect input directed to the virtual screenshot, such as a selection of a representation of an annotation, and can present information similar or the same as described with reference to annotations 320a, 320b, 320c, 320d, 320e, 320f, 320g, and/or 320h. For example, electronic device 101 can generate audio from a voice recording, can display text, can display visual indications of attention, and/or can display a movement of a representation of a user throughout the virtual scene, corresponding to a recorded movement of the user, from the overhead perspective. In some examples, the glyph is displayed concurrently with other virtual content, including the virtual scene, one or more virtual screenshots, virtual object(s), representations of annotations, and the like.
Some examples of the disclosure are directed to a virtual, three-dimensional model of the virtual scene.
In some examples, electronic device 101 displays a representation of a virtual scene within a three-dimensional environment and a plurality of selectable options for interacting with the representation of the virtual scene. For example, model 505 is a virtual object that includes a virtual assets corresponding to a virtual scene. In some examples, model 505 is part of and/or is displayed concurrently with a plurality of selectable options for inserting annotation(s) into the virtual scene and/or into the model 505. For example, menu 502 and menu 520 include selectable options that are respectively selectable to initiate modes of inserting annotations and/or review of the annotations.
Some examples described herein refer to selectable options that are selectable to cause some behavior or operations to be performed. It is understood that in such examples, electronic device 101 can detect an input directed to a particular selectable option, and in response to detecting that selectable option, can initiate a mode, disable the mode, and/or initiate an operation (e.g., displaying a menu and/or selectable options, initiating or display of virtual content, toggling selection of a virtual tool, initiating playback of virtual content, and/or some combination thereof). It is understood the aforementioned input can have one or more characteristics similar to or the same as other inputs described herein, such as an air gesture and/or an input directed to a peripheral computing device such as a stylus and/or an electronic pointing device used to direct a cursor in accordance with the position of the electronic device.
Menu 502 in
Menu 502 can include additional or alternative selectable options. For example, tool 508 corresponds to a simulated erasing tool. While selected, electronic device 101 can detect an input targeting an annotation within model 505, and can delete the targeted annotation or a portion thereof. Selectable options 512 and 514 can correspond to simulated writing implements, such as simulated markers, pens, pencils, highlighters, and/or some combination thereof. Selectable option 516 can correspond to a simulated ruler, and when selected, can initiate display of a ruler that can be positioned within the three-dimensional environment 500 and/or model 505. While displayed, simulated marks can align along the dimensions of the ruler.
Selectable option 518 can correspond to a plurality of selectable options, each corresponding to a different visual characteristics such as a color, opacity, fill pattern, saturation, and/or some combination thereof of an annotation. While a particular selectable option of selectable option 518 is selected, annotations inserted into the virtual scene represented by model 505 can be displayed with the visual characteristic(s) of the selected selectable option of selectable options 518. For example, simulated handwriting can be displayed with a blue, red, or green color, when a corresponding colored selectable option of selectable options 518 is selected. Additionally or alternatively, a color of a virtual pin corresponding to selectable option 512 can be displayed with a color of the selected selectable option of selectable options 518.
Menu 521 can include one or more selectable options which when selected can change a currently selected interaction and/or review mode of the virtual scene. For example, selectable option 522 can correspond to review playback operations. In response to detecting selection of selectable option 522, electronic device 101 can initiate a replay of a spatial recording as described with reference to
Menu 521 can further include selectable options 526, 528, and 530, which can respectively be selected to enable an interaction mode associated with the virtual scene represented by model 505. For example, in response to detecting input selecting selectable option 526, electronic device 101 can initiate display of menu 520 and/or can enable an annotation insertion mode. In response to detecting input selecting selectable option 528, electronic device 101 can initiate display of a listing of the annotations associated with the virtual scene represented by model 505 (e.g., as illustrated in
Menu 521 can further include a selectable option 534 that is selectable to initiate recording of a potentially collaborative inspection and/or annotation of the virtual scene and/or model 505. For example, in response to detecting input selecting selectable option 534, electronic device 101 can initiate a recording of annotation and/or commentary provided via a microphone and directed toward the virtual scene. The recording can span a period of time, and electronic device 101 can facilitate insertion of annotations into the virtual scene and/or model 505. While the recording is ongoing, electronic device 101 can detect input ceasing the recording, and in response, can cease the recording, defining an endpoint of a “review session” defined by the time of the recording. After the review session is recorded, electronic device 101 can load the review session, and facilitate playback of the review session (e.g., using selectable options 522 and 524).
In some examples, concurrent with the display of menu 502 and/or 521, electronic device 101 can display selectable option 540. Electronic device 101 can detect input selecting selectable option 540, and in response, can initiate display of the virtual scene represented by model 505 with a level of immersion relative to three-dimensional environment 500 that is greater than the level of immersion in
In
In some examples, the length and/or orientation of the lead line varies in accordance with movement of hand 538. For example, the lead line can move to extend toward a point defined by the air pinch gesture. The terminal point of the lead line (e.g., the end of lead line that is not anchored) can move in one or more directions and/or by one or more distances that are similar to, or the same as components of moving of hand 538 while the air gesture is maintained. Thus, the lead line can scale upwards, downwards, and/or be oriented in accordance with movement of hand 538.
In
In
As shown in
In some examples, in response to detecting input directed toward selectable option 528, electronic device 101 displays annotation markers 548a through 548d. It is understood that annotation markers 548a through 548d can respectively correspond to annotations 546a through 546d (e.g., annotation marker 548a corresponds to annotation 546a, annotation marker 548b corresponds to annotation 546b, etc.). Annotation markers 548a through 548d respectively indicate locations that annotations were inserted into the virtual scene, and can each be included in and/or overlay model 505.
In
Selectable option 552b can correspond to a previsualization viewing mode of model 505, which when selected, can cause electronic device 101 to display virtual content for creating and/or using model 505. For example, the virtual content can include text, images, models, lists of video shots, and/or storyboards relating to content that can be generated using the virtual scene represented by model 505. Selectable option 552c can correspond to a reference image viewing mode, which when selected, can cause electronic device 101 to display reference images and/or two and three-dimensional objects used to as visual references to create content, such as creating virtual models of physical objects, textures, and/or creating animations.
In
In some examples, the review session user interface includes a plurality of visual indications and/or selectable options, such as those included in controls menu 558. Controls menu 558 includes a selectable option 560 that is selectable to cease the playback of the review session.
In order to indicate the progression of the playback of a review session, control menu 558 can include scrubber bar 562. Scrubber bar 562 can include a visual play head that moves along the lateral dimension of scrubber bar 562 to illustrate the progression of playback relative to a duration of the recording. Additionally or alternatively, scrubber bar 562 can include visual indications overlaying scrubber bar 562, indicating the time that those annotations were inserted into the virtual scene. In some examples, the annotations markers 548a through 548d are displayed overlaying the model 505, irrespective of a time that the annotations were entered in the scene. In some examples, electronic device 101 displays annotations markers 548a through 548d when the review session playback reaches times that annotations associated with annotation markers 548a through 548d were inserted into the virtual scene. Additionally or alternatively, when review session playback reaches a time corresponding to an annotation, electronic device 101 can display the annotation, such as annotation 556.
Annotation 556 as shown in
In some examples, control menu 558 can include a selectable option 564 which when selected, initiates display of captions transcribing vocal annotations directed toward the virtual scene. For example, before or during the review session playback, electronic device 101 can detect input selecting selectable option 564. In response to detecting such input, electronic device 101 can initiate display of captions when the review session reaches a time that a voice annotation was directed toward the virtual scene.
The view of the three-dimensional environment is typically visible to the user via one or more display generation components (e.g., a display or a pair of display modules that provide stereoscopic content to different eyes of the same user) through a virtual viewport that has a viewport boundary that defines an extent of the three-dimensional environment that is visible to the user via the one or more display generation components. In some examples, the region defined by the viewport boundary is smaller than a range of vision of the user in one or more dimensions (e.g., based on the range of vision of the user, size, optical properties or other physical characteristics of the one or more display generation components, and/or the location and/or orientation of the one or more display generation components relative to the eyes of the user). In some examples, the region defined by the viewport boundary is larger than a range of vision of the user in one or more dimensions (e.g., based on the range of vision of the user, size, optical properties or other physical characteristics of the one or more display generation components, and/or the location and/or orientation of the one or more display generation components relative to the eyes of the user). The viewport and viewport boundary typically move as the one or more display generation components move (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone).
A viewpoint of a user determines what content is visible in the viewport, a viewpoint generally specifies a location and a direction relative to the three-dimensional environment, and as the viewpoint shifts, the view of the three-dimensional environment will also shift in the viewport. For a head mounted device, a viewpoint is typically based on a location an direction of the head, face, and/or eyes of a user to provide a view of the three-dimensional environment that is perceptually accurate and provides an immersive experience when the user is using the head-mounted device. For a handheld or stationed device, the viewpoint shifts as the handheld or stationed device is moved and/or as a position of a user relative to the handheld or stationed device changes (e.g., a user moving toward, away from, up, down, to the right, and/or to the left of the device). For devices that include display generation components with virtual passthrough, portions of the physical environment that are visible (e.g., displayed, and/or projected) via the one or more display generation components are based on a field of view of one or more cameras in communication with the display generation components which typically move with the display generation components (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone) because the viewpoint of the user moves as the field of view of the one or more cameras moves (and the appearance of one or more virtual objects displayed via the one or more display generation components is updated based on the viewpoint of the user (e.g., displayed positions and poses of the virtual objects are updated based on the movement of the viewpoint of the user)). For display generation components with optical passthrough, portions of the physical environment that are visible (e.g., optically visible through one or more partially or fully transparent portions of the display generation component) via the one or more display generation components are based on a field of view of a user through the partially or fully transparent portion(s) of the display generation component (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone) because the viewpoint of the user moves as the field of view of the user through the partially or fully transparent portions of the display generation components moves (and the appearance of one or more virtual objects is updated based on the viewpoint of the user).
In some examples a representation of a physical environment (e.g., displayed via virtual passthrough or optical passthrough) can be partially or fully obscured by a virtual environment. In some examples, the amount of virtual environment that is displayed (e.g., the amount of physical environment that is not displayed) is based on an immersion level for the virtual environment (e.g., with respect to the representation of the physical environment). For example, increasing the immersion level optionally causes more of the virtual environment to be displayed, replacing and/or obscuring more of the physical environment, and reducing the immersion level optionally causes less of the virtual environment to be displayed, revealing portions of the physical environment that were previously not displayed and/or obscured. In some examples, at a particular immersion level, one or more first background objects (e.g., in the representation of the physical environment) are visually de-emphasized (e.g., dimmed, blurred, and/or displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed.
In some examples, a level of immersion includes an associated degree to which the virtual content displayed by the computer system (e.g., the virtual environment and/or the virtual content) obscures background content (e.g., content other than the virtual environment and/or the virtual content) around/behind the virtual content, optionally including the number of items of background content displayed and/or the visual characteristics (e.g., colors, contrast, and/or opacity) with which the background content is displayed, the angular range of the virtual content displayed via the display generation component (e.g., 60 degrees of content displayed at low immersion, 120 degrees of content displayed at medium immersion, or 180 degrees of content displayed at high immersion), and/or the proportion of the field of view displayed via the display generation component that is consumed by the virtual content (e.g., 33% of the field of view consumed by the virtual content at low immersion, 66% of the field of view consumed by the virtual content at medium immersion, or 100% of the field of view consumed by the virtual content at high immersion). In some examples, the background content is included in a background over which the virtual content is displayed (e.g., background content in the representation of the physical environment). In some examples, the background content includes user interfaces (e.g., user interfaces generated by the computer system corresponding to applications), virtual objects (e.g., files or representations of other users generated by the computer system) not associated with or included in the virtual environment and/or virtual content, and/or real objects (e.g., pass-through objects representing real objects in the physical environment around the user that are visible such that they are displayed via the display generation component and/or a visible via a transparent or translucent component of the display generation component because the computer system does not obscure/prevent visibility of them through the display generation component). In some examples, at a low level of immersion (e.g., a first level of immersion), the background, virtual and/or real objects are displayed in an unobscured manner. For example, a virtual environment with a low level of immersion is optionally displayed concurrently with the background content, which is optionally displayed with full brightness, color, and/or translucency.
In some examples, at a higher level of immersion (e.g., a second level of immersion higher than the first level of immersion), the background, virtual and/or real objects are displayed in an obscured manner (e.g., dimmed, blurred, or removed from display). For example, a respective virtual environment with a high level of immersion is displayed without concurrently displaying the background content (e.g., in a full screen or fully immersive mode). As another example, a virtual environment displayed with a medium level of immersion is displayed concurrently with darkened, blurred, or otherwise de-emphasized background content. In some examples, the visual characteristics of the background objects vary among the background objects. For example, at a particular immersion level, one or more first background objects are visually de-emphasized (e.g., dimmed, blurred, and/or displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed. In some examples, a null or zero level of immersion corresponds to the virtual environment ceasing to be displayed and instead a representation of a physical environment is displayed (optionally with one or more virtual objects such as application, windows, or virtual three-dimensional objects) without the representation of the physical environment being obscured by the virtual environment. Adjusting the level of immersion using a physical input element provides for quick and efficient method of adjusting immersion, which enhances the operability of the computer system and makes the user-device interface more efficient.
Therefore, according to the above, some examples of the disclosure are directed to a method performed at a computer system in communication with one or more input devices and a display. In some examples, the method comprises, while the computer system is participating in a communication session that includes one or more participants, and while displaying a virtual scene from a viewpoint of a user of the computer system within a three-dimensional environment of the user, obtaining, via the one or more input devices, first information including a request to display a first annotation at a position within the three-dimensional environment wherein correspondence of the first annotation to a respective position within the three-dimensional environment is determined based on a context of the user's interaction with the three-dimensional environment. In some examples, in response to obtaining the first information, and in accordance with a determination that the first annotation corresponds to a first position in the virtual scene, the method can comprise displaying a first representation of the first annotation at the first position. In some examples, in response to obtaining the first information, and in accordance with a determination that the first annotation corresponds to a second position in the virtual scene, the method can comprise displaying the first representation of the first annotation at the second position.
Additionally or alternatively, in some examples, the first representation includes a positional marker including a graphical representation indicating one or more portions of the virtual scene.
Additionally or alternatively, in some examples, the first representation includes a simulated marking having one or more visual characteristics based on a simulated marking implement selected while the first information is obtained.
Additionally or alternatively, in some examples, the context of the user's interaction with the three-dimensional environment is based on a target of attention of the user. In some examples, the method can further comprise before displaying the first representation of the first annotation, displaying, via the display, a visual indication of the target of the attention of the user.
Additionally or alternatively, in some examples, the target of the attention of the user is based on a position of a gaze of the user relative to the three-dimensional environment.
Additionally or alternatively, in some examples, the target of the attention of the user is based on a spatial relationship between one or more portions of a body of the user, the viewpoint of the user, and the three-dimensional environment.
Additionally or alternatively, in some examples, the first information is provided by a participant of the communication session different from the user of the computer system, and the first annotation has one or more visual characteristics that visually indicate that the participant provided the first information.
Additionally or alternatively, in some examples, the one or more visual characteristics include a color of the first representation.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the first representation of the first annotation, detecting, via the one or more input devices, an input directed to the first representation of the first annotation, and in response to detecting the input directed to the first representation, presenting contextual information included in the first information.
Additionally or alternatively, in some examples, the presenting of the contextual information includes presenting audio associated with the first annotation recorded by a participant of the communication session.
Additionally or alternatively, in some examples, the presenting of the contextual information includes displaying text associated with the first annotation recorded by a participant of the communication session.
Additionally or alternatively, in some examples, the presenting of the contextual information includes displaying a representation of a participant that provided the first information moving within the three-dimensional environment.
Additionally or alternatively, in some examples, the method can further comprise while the first representation of the first annotation is included within the three-dimensional environment, obtaining second information, different from the first information, requesting display of a user interface directed to displaying a plurality of representations of a plurality of annotations including the first annotation. In some examples, the method can further comprise, in response to obtaining the second information, displaying, via the display, at least a portion of the plurality of representations of the plurality of annotations, including a second representation of the first annotation, different from the first representation of the first annotation.
Additionally or alternatively, in some examples, one or more of the plurality of representations are visually grouped in accordance with one or more characteristics of respective annotations of the plurality of annotations.
Additionally or alternatively, in some examples, the one or more characteristics include an originator of respective annotations of the plurality of annotations.
Additionally or alternatively, in some examples, the one or more characteristics include one or more times that information corresponding to respective annotations of the plurality of annotations were obtained.
Additionally or alternatively, in some examples, the one or more characteristics include a category of information included in respective annotations of the plurality of annotations.
Additionally or alternatively, in some examples, the first information includes an air gesture performed by one or more portions of a body of the user.
Additionally or alternatively, in some examples, the method can further comprise, while the computer system is participating in the communication session that includes the one or more participants, and while displaying the virtual scene from the viewpoint of the user of the computer system, obtaining, via the one or more input devices, second information including a request to include a second annotation within the three-dimensional environment. In some examples, the method can further comprise, in response to obtaining the second information, presenting feedback indicating a spatial relationship between a position of the second annotation within the three-dimensional environment and the viewpoint of the user.
Additionally or alternatively, in some examples, the feedback includes audio that is generated with a three-dimensional environment effect emulating the sensation of a physical audio source generating the audio from the position of the second annotation.
Additionally or alternatively, in some examples, the method can further comprise, while the computer system is participating in the communication session that includes the one or more participants, displaying, via the display, a representation of a first participant of the one or more participants of the communication session at a second position within the three-dimensional environment.
Additionally or alternatively, in some examples, the representation of the first participant is an expressive avatar including one or more simulated body parts corresponding to physical body parts of the first participant.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the representation of the first participant at the second position within the three-dimensional environment, obtaining second information including a request to move the representation of the first participant within the three-dimensional environment. In some examples, the method can further comprise, in response to obtaining the second information, moving the representation of the first participant from the second position to a third position in accordance with the request to move the representation of the first participant.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the representation of the first participant at the second position within the three-dimensional environment, obtaining second information including a request to display a target of attention of the first participant. In some examples, the method can further comprise, in response to obtaining the second information, displaying, via the display, a visual indication of the attention at a third position within the three-dimensional environment.
Additionally or alternatively, in some examples, a visual appearance of the first representation of the first annotation corresponds to content included in the first annotation.
Additionally or alternatively, in some examples, the correspondence between the first annotation and the respective position within the three-dimensional environment is based on a spatial relationship between one or more portions of the user's body relative to the three-dimensional environment.
Additionally or alternatively, in some examples, the context of the user's interaction with the three-dimensional environment is associated with one or more words spoken by the user of the computer system included in the first information.
Additionally or alternatively, in some examples, the first position is a predetermined distance from the viewpoint of the user.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the virtual scene from the viewpoint of the user, obtaining second information including a request to capture a two-dimensional image of the virtual scene, wherein a portion of the three-dimensional environment is visible via a viewport of the computer system when the second information is obtained. In some examples, the method can further comprise, in response to obtaining the second information, displaying, via the display, a virtual object including a two-dimensional image of the three-dimensional environment including the portion of the three-dimensional environment.
Additionally or alternatively, in some examples, the displaying of the virtual object includes displaying an animation that includes, displaying the virtual object with a first size corresponding to the viewport of the computer system in response to the second information. In some examples, the animation includes, after displaying the virtual object, shrinking the virtual object to a second size, and concurrent with the shrinking of the virtual object, maintaining visibility of the portion of the three-dimensional environment via the viewport.
Additionally or alternatively, in some examples, a simulated thickness of the virtual object is less than a simulated thickness threshold.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the virtual object with a level of visual prominence that is a first level of visual prominence relative to the three-dimensional environment and while the viewpoint of the user is a first viewpoint, detecting, via the one or more input devices, a change in the viewpoint of the user to a second viewpoint. In some examples, the method can further comprise, in response to detecting the change in the viewpoint, decreasing the level of visual prominence to a second level of visual prominence, less than the first level of visual prominence.
Additionally or alternatively, in some examples, the method can further comprise while displaying the virtual object, detecting, via the one or more input devices, a request to display a simulated marking within the virtual object including movement of one or more portions of a body of the user. In some examples, the method can further comprise, in response to detecting the request to display the simulated marking, displaying the simulated marking in accordance with the movement.
Additionally or alternatively, in some examples, the method can further comprise, while displaying the virtual object, detecting, via the one or more input devices, a request to select virtual content included in the virtual object. In some examples, the method can further comprise, in response to the request to select virtual content, displaying a visual indication of a target of the request within the virtual object.
Some examples of the disclosure are directed to an electronic device comprising one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing a method as described herein.
Some examples of the disclosure are directed to a non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to perform a method as described herein.
Some examples of the disclosure are directed to an electronic device, comprising, one or more processors, memory, and means for performing a method as described herein.
Some examples of the disclosure are directed to an information processing apparatus for use in an electronic device, the information processing apparatus comprising means for performing a method as described herein.
Some examples of the disclosure are directed to an electronic device in communication with one or more input devices and a display, the electronic device comprising one or more processors, memory, one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing a method. In some examples, the method can comprise, while the computer system is participating in a communication session that includes one or more participants, and while displaying a virtual scene from a viewpoint of a user of the computer system within a three-dimensional environment of the user, obtaining, via the one or more input devices, first information including a request to display a first annotation at a position within the three-dimensional environment wherein correspondence of the first annotation to a respective position within the three-dimensional environment is determined based on a context of the user's interaction with the three-dimensional environment. In some examples, the method can comprise, in response to obtaining the first information, in accordance with a determination that the first annotation corresponds to a first position in the virtual scene, displaying a first representation of the first annotation at the first position, and in accordance with a determination that the first annotation corresponds to a second position in the virtual scene, displaying the first representation of the first annotation at the second position.
Some examples of the disclosure are directed to a non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device in one or more input devices and a display, cause the electronic device to perform a method. In some examples, the method can comprise, while the computer system is participating in a communication session that includes one or more participants, and while displaying a virtual scene from a viewpoint of a user of the computer system within a three-dimensional environment of the user, obtaining, via the one or more input devices, first information including a request to display a first annotation at a position within the three-dimensional environment wherein correspondence of the first annotation to a respective position within the three-dimensional environment is determined based on a context of the user's interaction with the three-dimensional environment. In some examples, the method can comprise, in response to obtaining the first information, in accordance with a determination that the first annotation corresponds to a first position in the virtual scene, displaying a first representation of the first annotation at the first position, and in accordance with a determination that the first annotation corresponds to a second position in the virtual scene, displaying the first representation of the first annotation at the second position.
The foregoing description, for purpose of explanation, has been described with reference to specific examples. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The examples were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best use the invention and various described examples with various modifications as are suited to the particular use contemplated.
This application claims the benefit of U.S. Provisional Application No. 63/586,783, filed Sep. 29, 2023, the entire disclosure of which is herein incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63586783 | Sep 2023 | US |