The present disclosure relates generally to computer systems that are in communication with a display generation component, a first camera, and, optionally, a second camera that provide computer-generated experiences, including, but not limited to, electronic devices that provide virtual reality and mixed reality experiences via a display.
The development of computer systems for augmented reality has increased significantly in recent years. Example augmented reality environments include at least some virtual elements that replace or augment the physical world. Input devices, such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch-screen displays for computer systems and other electronic computing devices are used to interact with virtual/augmented reality environments. Example virtual elements include virtual objects, such as digital images, video, text, icons, and control elements such as buttons and other graphics.
Some methods and interfaces for capturing media with a camera application (e.g., while interacting with environments that include at least some virtual elements such as applications, augmented reality environments, mixed reality environments, and/or virtual reality environments) are cumbersome, inefficient, and limited. For example, systems that provide insufficient feedback on the state of the computer system while the user is trying to capture media (e.g., the readiness of the computer system to capture media, the orientation of the camera(s) used for media capture, and/or the current capture quality), systems that excessively obscure the environment while the user is trying to capture media, and systems in which inputs for controlling media capture are complex, tedious, and/or error-prone, create a significant cognitive burden on a user, and detract from the media capture experience. In addition, these methods take longer than necessary, thereby wasting energy of the computer system. This latter consideration is particularly important in battery-operated devices.
Accordingly, there is a need for computer systems with improved methods and interfaces for providing computer-generated experiences to users that make capturing media with the computer systems more efficient and intuitive for a user. Such methods and interfaces optionally complement or replace conventional methods for media capture with a camera application. Such methods and interfaces reduce the number, extent, and/or nature of the inputs from a user by helping the user to understand the connection between provided inputs and device responses to the inputs, thereby creating a more efficient human-machine interface.
The above deficiencies and other problems associated with user interfaces for computer systems are reduced or eliminated by the disclosed systems. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is portable device (e.g., a notebook computer, tablet computer, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device, such as a watch, or a head-mounted device). In some embodiments, the computer system has a touchpad. In some embodiments, the computer system has one or more cameras. In some embodiments, the computer system has a touch-sensitive display (also known as a “touch screen” or “touch-screen display”). In some embodiments, the computer system has one or more eye-tracking components. In some embodiments, the computer system has one or more hand-tracking components. In some embodiments, the computer system has one or more output devices in addition to the display generation component, the output devices including one or more tactile output generators and/or one or more audio output devices. In some embodiments, the computer system has a graphical user interface (GUI), one or more processors, memory and one or more modules, programs or sets of instructions stored in the memory for performing multiple functions. In some embodiments, the user interacts with the GUI through a stylus and/or finger contacts and gestures on the touch-sensitive surface, movement of the user's eyes and hand in space relative to the GUI (and/or computer system) or the user's body as captured by cameras and other movement sensors, and/or voice inputs as captured by one or more audio input devices. In some embodiments, the functions performed through the interactions optionally include image editing, drawing, presenting, word processing, spreadsheet making, game playing, telephoning, video conferencing, e-mailing, instant messaging, workout support, digital photographing, digital videoing, web browsing, digital music playing, note taking, and/or digital video playing. Executable instructions for performing these functions are, optionally, included in a transitory and/or non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.
There is a need for electronic devices with improved methods and interfaces for capturing media with a camera application. Such methods and interfaces may complement or replace conventional methods for capturing media with a camera application. Such methods and interfaces provide a user with improved feedback on the state of the computer system, reduce the number, extent, and/or the nature of the inputs from a user and produce a more efficient human-machine interface. For battery-operated computing devices, such methods and interfaces conserve power and increase the time between battery charges. Such methods and interfaces reduce energy usage, thereby reducing heat emitted by the computing devices, which is particularly important for wearable computing devices such as head-mounted devices (HMDs) that can become uncomfortable for a user to wear if too much heat is produced, even when operating well within operational parameters for the device components.
In some embodiments, a computer system displays a set of controls associated with controlling playback of media content (e.g., transport controls and/or other types of controls) in response to detecting a gaze and/or gesture of the user. In some embodiments, the computer system initially displays a first set of controls in a reduced-prominence state (e.g., with reduced visual prominence) in response to detecting a first input, and then displays a second set of controls (which optionally includes additional controls) in an increased-prominence state in response to detecting a second input. In this manner, the computer system optionally provides feedback to the user that they have begun to invoke display of the controls without unduly distracting the user from the content (e.g., by initially displaying controls in a less visually prominent manner), and then, based on detecting a user input indicating that the user wishes to further interact with the controls, displaying the controls in a more visually prominent manner to allow for easier and more-accurate interactions with the computer system.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component and a first camera is described. The method includes: while displaying, via the display generation component, a first user interface that includes a camera viewfinder: detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: while displaying, via the display generation component, a first user interface that includes a camera viewfinder detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: while displaying, via the display generation component, a first user interface that includes a camera viewfinder: detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a first camera, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while displaying, via the display generation component, a first user interface that includes a camera viewfinder: detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a first camera, and the computer system comprises: means, while displaying, via the display generation component, a first user interface that includes a camera viewfinder, for: detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: while displaying, via the display generation component, a first user interface that includes a camera viewfinder: detecting a first input; and in response to detecting the first input: in accordance with a determination that a gaze of a user of the computer system a respective region of the camera viewfinder when the first input is detected, initiating capture of first media content using the first camera; and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoing initiating capture of the first media content.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component and a first camera is described. The method includes: displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and in response to detecting the change in the orientation: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and in response to detecting the change in the orientation: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and in response to detecting the change in the orientation: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a first camera, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and in response to detecting the change in the orientation: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a first camera, and the computer system comprises: means for displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; means for detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and means, in response to detecting the change in the orientation, for: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a first camera, the one or more programs including instructions for: displaying, via the display generation component, a first user interface that includes a camera preview of at least a portion of a field-of-view of the first camera; detecting a change in an orientation of the field-of-view of the first camera with respect to a respective orientation; and in response to detecting the change in the orientation: in accordance with a determination that a first set of criteria are met, displaying a first indicator representing the orientation of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component and a plurality of cameras including a first camera and a second camera. The method includes: displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a plurality of cameras including a first camera and a second camera, the one or more programs including instructions for: displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a plurality of cameras including a first camera and a second camera, the one or more programs including instructions for: displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a plurality of cameras including a first camera and a second camera, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a plurality of cameras including a first camera and a second camera, and the computer system comprises: means for displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; means for, while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and means for, in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component and a plurality of cameras including a first camera and a second camera, the one or more programs including instructions for: displaying, via the display generation component, a capture preview for spatial media capture, wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera and the second camera to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras; while displaying the capture preview for spatial media capture, detecting a location of a subject in the field-of-view of the plurality of cameras; and in response to detecting the location of the subject in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality, displaying, via the display generation component, a prompt to change a distance between the subject and the plurality of cameras.
In accordance with some embodiments, a method performed at a computer system with a display generation component and one or more sensors, the one or more sensors including one or more cameras is described. The method includes: capturing video media using the one or more cameras; and while capturing the video media: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more sensors, the one or more sensors including one or more cameras, the one or more programs including instructions for: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more sensors, the one or more sensors including one or more cameras, the one or more programs including instructions for: capturing video media using the one or more cameras; and while capturing the video media: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and one or more sensors, the one or more sensors including one or more cameras, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: capturing video media using the one or more cameras; and while capturing the video media: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and one or more sensors, the one or more sensors including one or more cameras, and the computer system comprises: means for capturing video media using the one or more cameras; and means for, while capturing the video media: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more sensors, the one or more sensors including one or more cameras, the computer program product including instructions for: capturing video media using the one or more cameras; and while capturing the video media: detecting, via the one or more sensors, a movement of the one or more cameras; and in response to detecting the movement of the one or more of cameras: in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria, displaying, via the display generation component, a movement of a visual indicator relative to a displayed reference object, wherein displaying the movement of the visual indicator includes: in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a first direction of indicator movement; and in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement, displaying the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component is described. The method includes: while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and a first camera, and the computer system comprises: means for, while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while playback of a video media item is ongoing, wherein playback of the video media item includes displaying the video media item concurrently with a border region that is outside of the video media item, changing a visual prominence of the video media item relative to the border region based on a representation of movement of a viewpoint corresponding to the video media item that occurred while the video media item was being captured, wherein changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement, changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence; and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement, changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component and one or more cameras is described. The method includes: while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more cameras, the one or more programs including instructions for: while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more cameras, the one or more programs including instructions for: while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and one or more cameras, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and one or more cameras, and the computer system comprises: means for, while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; means for, while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and means for, in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component and one or more cameras, the one or more programs including instructions for: while capturing spatial video media of an environment using the one or more cameras, wherein the spatial video media includes a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation of the environment, displaying, via the display generation component, a virtual indicator element of an anchor location in the environment that represents a respective viewpoint corresponding to the spatial video media, wherein the virtual indicator element is displayed while the environment is visible via the display generation component; while displaying the virtual indicator element while the environment is visible via the display generation component, detecting a first change in a viewpoint from which the spatial video media is being captured; and in response to detecting the first change in the viewpoint from which the spatial video media is being captured, changing an appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media.
In accordance with some embodiments, a method performed at a computer system that is in communication with a display generation component is described. The method includes: while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In accordance with some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In accordance with some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component, and the computer system comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In accordance with some embodiments, a computer system is described. The computer system is configured to communicate with a display generation component and, and the computer system comprises: means for, while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In accordance with some embodiments, a computer program product is described. The computer program product is configured to be executed by one or more processors of a computer system that is in communication with a display generation component, the one or more programs including instructions for: while displaying, via the display generation component, a representation of a spatial media item, wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation: in accordance with a determination that the spatial media item meets a set of one or more stability criteria, displaying a spatial viewing indicator with a first appearance concurrently with the representation of the spatial media item; and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the spatial media item.
In some embodiments, a method is described. In some embodiments, the method is performed at a computer system that is in communication with one or more display generation components and one or more cameras. The method comprises: while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a computer system is described. The computer system is configured to communicate with one or more display generation components and one or more cameras, and comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a computer system is described. The computer system is configured to communicate with one or more display generation components and one or more cameras, and comprises: means for, while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a computer program product is described. The computer program product comprises one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: while the computer system is in a spatial media capture mode that corresponds to spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content and while the computer system is not capturing spatial media: in accordance with a determination that an orientation of the computer system is outside of a threshold range of orientations, outputting a first prompt that prompts the user to rotate the computer system into the threshold range of orientations.
In some embodiments, a method is described. In some embodiments, the method is performed at a computer system that is in communication with one or more display generation components and one or more cameras. The method comprises: displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
In some embodiments, a non-transitory computer-readable storage medium is described. The non-transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
In some embodiments, a transitory computer-readable storage medium is described. The transitory computer-readable storage medium stores one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
In some embodiments, a computer system is described. The computer system is configured to communicate with one or more display generation components and one or more cameras, and comprises: one or more processors; and memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
In some embodiments, a computer system is described. The computer system is configured to communicate with one or more display generation components and one or more cameras, and comprises: means for displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and means for, while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
In some embodiments, a computer program product is described. The computer program product comprises one or more programs configured to be executed by one or more processors of a computer system that is in communication with one or more display generation components and one or more cameras, the one or more programs including instructions for: displaying, via the one or more display generation components, a first user interface corresponding to a camera application of the computer system; and while displaying the first user interface corresponding to the camera application of the computer system: in accordance with a determination that the computer system is associated with a head-mounted device separate from the computer system, providing a spatial media capture mode option corresponding to a spatial media capture mode for capturing spatial media that includes a first visual component corresponding to a viewpoint of a right eye and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content; and in accordance with a determination that the computer system is not associated with a head-mounted device separate from the computer system, forgoing providing the spatial media capture mode option.
Note that the various embodiments described above can be combined with any other embodiments described herein. The features and advantages described in the specification are not all inclusive and, in particular many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.
For a better understanding of the various described embodiments, reference should be made to the Description of Embodiments below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
FIGS. 11A1-11H illustrate example techniques for displaying a camera preview for spatial media capture with prompts to improve capture quality, in some embodiments.
The present disclosure relates to user interfaces for providing an extended reality (XR) experience to a user, in some embodiments.
The systems, methods, and GUIs described herein improve capturing media with a camera application in multiple ways.
In some embodiments, while displaying a user interface for media capture, a computer system tracks a gaze of a user while detecting potential user inputs (e.g., hardware button presses, inputs on touch sensitive surfaces, and/or air gestures). If the user is gazing at a particular region of the user interface (e.g., a central region) when an input is detected, such as at or near a media capture affordance displayed at the center of the user interface, the computer system initiates media capture, and if the user is not gazing at the particular region of the user interface when an input is detected, the does not initiate media capture. By initiating media capture only if the user is gazing at the particular region when an input is detected, unintended and/or undesirable media captures are reduced, allowing the user to freely interact with the computer system and/or the environment without accidentally capturing media. In addition, initiating media capture only if the user is gazing at the particular region when an input is detected allows the user to efficiently and intuitively capture media when desired, for example, quickly enabling media capture during transient media capture opportunities by gazing at the particular region and providing an input.
In some embodiments, a computer system detects how an orientation of the camera(s) used for media capture (e.g., an orientation of the field-of-view of the camera(s)) changes with respect to a target orientation (e.g., an orientation that is level to the horizon of the environment). In response to detecting a change in the orientation of the camera(s), if the orientation of the camera(s) differs from the target orientation by more than a threshold amount, the computer system displays a level indicator that represents the orientation of the camera(s). Displaying the level indicator when the orientation of the camera(s), and thus of any media captured by the camera(s), differs from the target orientation by more than a threshold amount reduces unintended and/or undesirable media captures by alerting the user when captured media will not appear level. In addition, displaying the level indicator allows the user to efficiently and intuitively compose a media capture with the desired orientation.
In some embodiments, a computer system displays a user interface for spatial media capture, where multiple cameras are used to generate different images for the right and left eye of a user, creating an appearance/illusion of depth (e.g., three-dimensionality, such that the relative distance of objects from the plane of capture can be perceived) in the captured media (e.g., the different images generated for the right eye and left eye of the user mimic the different images of a physical environment received at the right eye and left eye due to the positional differences between the eyes). The computer system detects where a subject of the media capture is located relative to the multiple cameras and determines whether the relative location of the subject will adversely affect the quality of the spatial media capture (e.g., the appearance/illusion of depth). For example, if a subject is located too close to the multiple cameras, the generated images for the left and right eye will differ too much to create a quality appearance/illusion of depth, while if the subject is located too far from the multiple cameras, the generated images for the left and right eye will not differ enough to create a quality appearance/illusion of depth. If the computer system determines that the relative location of the subject will adversely affect the quality of the spatial media capture, the computer system displays a prompt to the user to change the distance from the subject. Displaying the prompt to change the distance reduces unintended and/or undesirable media captures by alerting the user when capturing spatial media will not result in a quality appearance/illusion of depth. In addition, displaying the level indicator allows the user to efficiently and intuitively compose a spatial media capture of the desired quality.
In some embodiments, a computer system displays content in a first region of a user interface. In some embodiments, while the computer system is displaying the content and while a first set of controls are not displayed in a first state, the computer system detects a first input from a first portion of a user. In some embodiments, in response to detecting the first input, and in accordance with a determination that a gaze of the user is directed to a second region of the user interface when the when the first input is detected, the computer system displays, in the user interface, the first set of one or more controls in the first state, and in accordance with a determination that the gaze of the user is not directed to the second region of the user interface when the first input is detected, the computer system forgoes displaying the first set of one or more controls in the first state.
In some embodiments, a computer system displays content in a user interface. In some embodiments, while displaying the content, the computer system detects a first input based on movement of a first portion of a user of the computer system. In some embodiments, in response to detecting the first input, the computer system displays, in the user interface, a first set of one or more controls, where the first set of one or more controls are displayed in a first state and are displayed within a first region of the user interface. In some embodiments, while displaying the first set of one or more controls in the first state: in accordance with a determination that one or more first criteria are satisfied, including a criterion that is satisfied when attention of the user is directed to the first region of the user interface based on a movement of a second portion of the user that is different from the first portion of the user, the computer system transitions from displaying the first set of one or more controls in the first state to displaying a second set of one or more controls in a second state, where the second state is different from the first state.
The processes described below enhance the operability of the devices and make the user-device interfaces more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the device) through various techniques, including by providing improved visual feedback to the user, reducing the number of inputs needed to perform an operation, providing additional control options without cluttering the user interface with additional displayed controls, performing an operation when a set of conditions has been met without requiring further user input, improving privacy and/or security, providing a more varied, detailed, and/or realistic user experience while saving storage space, and/or additional techniques. These techniques also reduce power usage and improve battery life of the device by enabling the user to use the device more quickly and efficiently. Saving on battery power, and thus weight, improves the ergonomics of the device. These techniques also enable real-time communication, allow for the use of fewer and/or less precise sensors resulting in a more compact, lighter, and cheaper device, and enable the device to be used in a variety of lighting conditions. These techniques reduce energy usage, thereby reducing heat emitted by the device, which is particularly important for a wearable device where a device well within operational parameters for device components can become uncomfortable for a user to wear if it is producing too much heat.
In addition, in methods described herein where one or more steps are contingent upon one or more conditions having been met, it should be understood that the described method can be repeated in multiple repetitions so that over the course of the repetitions all of the conditions upon which steps in the method are contingent have been met in different repetitions of the method. For example, if a method requires performing a first step if a condition is satisfied, and a second step if the condition is not satisfied, then a person of ordinary skill would appreciate that the claimed steps are repeated until the condition has been both satisfied and not satisfied, in no particular order. Thus, a method described with one or more steps that are contingent upon one or more conditions having been met could be rewritten as a method that is repeated until each of the conditions described in the method has been met. This, however, is not required of system or computer readable medium claims where the system or computer readable medium contains instructions for performing the contingent operations based on the satisfaction of the corresponding one or more conditions and thus is capable of determining whether the contingency has or has not been satisfied without explicitly repeating steps of a method until all of the conditions upon which steps in the method are contingent have been met. A person having ordinary skill in the art would also understand that, similar to a method with contingent steps, a system or computer readable storage medium can repeat the steps of a method as many times as are needed to ensure that all of the contingent steps have been performed.
In some embodiments, as shown in
When describing a XR experience, various terms are used to differentially refer to several related but distinct environments that the user may sense and/or with which a user may interact (e.g., with inputs detected by a computer system 101 generating the XR experience that cause the computer system generating the XR experience to generate audio, visual, and/or tactile feedback corresponding to various inputs provided to the computer system 101). The following is a subset of these terms:
Extended reality: In contrast, an extended reality (XR) environment refers to a wholly or partially simulated environment that people sense and/or interact with via an electronic system. In XR, a subset of a person's physical motions, or representations thereof, are tracked, and, in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner that comports with at least one law of physics. For example, a XR system may detect a person's head turning and, in response, adjust graphical content and an acoustic field presented to the person in a manner similar to how such views and sounds would change in a physical environment. In some situations (e.g., for accessibility reasons), adjustments to characteristic(s) of virtual object(s) in a XR environment may be made in response to representations of physical motions (e.g., vocal commands). A person may sense and/or interact with a XR object using any one of their senses, including sight, sound, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides the perception of point audio sources in 3D space. In another example, audio objects may enable audio transparency, which selectively incorporates ambient sounds from the physical environment with or without computer-generated audio. In some XR environments, a person may sense and/or interact only with audio objects.
Examples of XR include virtual reality and mixed reality.
Virtual reality: A virtual reality (VR) environment refers to a simulated environment that is designed to be based entirely on computer-generated sensory inputs for one or more senses. A VR environment comprises a plurality of virtual objects with which a person may sense and/or interact. For example, computer-generated imagery of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in the VR environment through a simulation of the person's presence within the computer-generated environment, and/or through a simulation of a subset of the person's physical movements within the computer-generated environment.
Mixed reality: In contrast to a VR environment, which is designed to be based entirely on computer-generated sensory inputs, a mixed reality (MR) environment refers to a simulated environment that is designed to incorporate sensory inputs from the physical environment, or a representation thereof, in addition to including computer-generated sensory inputs (e.g., virtual objects). On a virtuality continuum, a mixed reality environment is anywhere between, but not including, a wholly physical environment at one end and virtual reality environment at the other end. In some MR environments, computer-generated sensory inputs may respond to changes in sensory inputs from the physical environment. Also, some electronic systems for presenting an MR environment may track location and/or orientation with respect to the physical environment to enable virtual objects to interact with real objects (that is, physical articles from the physical environment or representations thereof). For example, a system may account for movements so that a virtual tree appears stationary with respect to the physical ground.
Examples of mixed realities include augmented reality and augmented virtuality.
Augmented reality: An augmented reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment, or a representation thereof. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present virtual objects on the transparent or translucent display, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. Alternatively, a system may have an opaque display and one or more imaging sensors that capture images or video of the physical environment, which are representations of the physical environment. The system composites the images or video with virtual objects, and presents the composition on the opaque display. A person, using the system, indirectly views the physical environment by way of the images or video of the physical environment, and perceives the virtual objects superimposed over the physical environment. As used herein, a video of the physical environment shown on an opaque display is called “pass-through video,” meaning a system uses one or more image sensor(s) to capture images of the physical environment, and uses those images in presenting the AR environment on the opaque display. Further alternatively, a system may have a projection system that projects virtual objects into the physical environment, for example, as a hologram or on a physical surface, so that a person, using the system, perceives the virtual objects superimposed over the physical environment. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing pass-through video, a system may transform one or more sensor images to impose a select perspective (e.g., viewpoint) different than the perspective captured by the imaging sensors. As another example, a representation of a physical environment may be transformed by graphically modifying (e.g., enlarging) portions thereof, such that the modified portion may be representative but not photorealistic versions of the originally captured images. As a further example, a representation of a physical environment may be transformed by graphically eliminating or obfuscating portions thereof.
Augmented virtuality: An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer-generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment.
In an augmented reality, mixed reality, or virtual reality environment, a view of a three-dimensional environment is visible to a user. The view of the three-dimensional environment is typically visible to the user via one or more display generation components (e.g., a display or a pair of display modules that provide stereoscopic content to different eyes of the same user) through a virtual viewport that has a viewport boundary that defines an extent of the three-dimensional environment that is visible to the user via the one or more display generation components. In some embodiments, the region defined by the viewport boundary is smaller than a range of vision of the user in one or more dimensions (e.g., based on the range of vision of the user, size, optical properties or other physical characteristics of the one or more display generation components, and/or the location and/or orientation of the one or more display generation components relative to the eyes of the user). In some embodiments, the region defined by the viewport boundary is larger than a range of vision of the user in one or more dimensions (e.g., based on the range of vision of the user, size, optical properties or other physical characteristics of the one or more display generation components, and/or the location and/or orientation of the one or more display generation components relative to the eyes of the user). The viewport and viewport boundary typically move as the one or more display generation components move (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone). A viewpoint of a user determines what content is visible in the viewport, a viewpoint generally specifies a location and a direction relative to the three-dimensional environment, and as the viewpoint shifts, the view of the three-dimensional environment will also shift in the viewport. For a head mounted device, a viewpoint is typically based on a location an direction of the head, face, and/or eyes of a user to provide a view of the three-dimensional environment that is perceptually accurate and provides an immersive experience when the user is using the head-mounted device. For a handheld or stationed device, the viewpoint shifts as the handheld or stationed device is moved and/or as a position of a user relative to the handheld or stationed device changes (e.g., a user moving toward, away from, up, down, to the right, and/or to the left of the device). For devices that include display generation components with virtual passthrough, portions of the physical environment that are visible (e.g., displayed, and/or projected) via the one or more display generation components are based on a field of view of one or more cameras in communication with the display generation components which typically move with the display generation components (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone) because the viewpoint of the user moves as the field of view of the one or more cameras moves (and the appearance of one or more virtual objects displayed via the one or more display generation components is updated based on the viewpoint of the user (e.g., displayed positions and poses of the virtual objects are updated based on the movement of the viewpoint of the user)). For display generation components with optical passthrough, portions of the physical environment that are visible (e.g., optically visible through one or more partially or fully transparent portions of the display generation component) via the one or more display generation components are based on a field of view of a user through the partially or fully transparent portion(s) of the display generation component (e.g., moving with a head of the user for a head mounted device or moving with a hand of a user for a handheld device such as a tablet or smartphone) because the viewpoint of the user moves as the field of view of the user through the partially or fully transparent portions of the display generation components moves (and the appearance of one or more virtual objects is updated based on the viewpoint of the user).
In some embodiments a representation of a physical environment (e.g., displayed via virtual passthrough or optical passthrough) can be partially or fully obscured by a virtual environment. In some embodiments, the amount of virtual environment that is displayed (e.g., the amount of physical environment that is not displayed) is based on an immersion level for the virtual environment (e.g., with respect to the representation of the physical environment). For example, increasing the immersion level optionally causes more of the virtual environment to be displayed, replacing and/or obscuring more of the physical environment, and reducing the immersion level optionally causes less of the virtual environment to be displayed, revealing portions of the physical environment that were previously not displayed and/or obscured. In some embodiments, at a particular immersion level, one or more first background objects (e.g., in the representation of the physical environment) are visually de-emphasized (e.g., dimmed, blurred, and/or displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed. In some embodiments, a level of immersion includes an associated degree to which the virtual content displayed by the computer system (e.g., the virtual environment and/or the virtual content) obscures background content (e.g., content other than the virtual environment and/or the virtual content) around/behind the virtual content, optionally including the number of items of background content displayed and/or the visual characteristics (e.g., colors, contrast, and/or opacity) with which the background content is displayed, the angular range of the virtual content displayed via the display generation component (e.g., 60 degrees of content displayed at low immersion, 120 degrees of content displayed at medium immersion, or 180 degrees of content displayed at high immersion), and/or the proportion of the field of view displayed via the display generation component that is consumed by the virtual content (e.g., 33% of the field of view consumed by the virtual content at low immersion, 66% of the field of view consumed by the virtual content at medium immersion, or 100% of the field of view consumed by the virtual content at high immersion). In some embodiments, the background content is included in a background over which the virtual content is displayed (e.g., background content in the representation of the physical environment). In some embodiments, the background content includes user interfaces (e.g., user interfaces generated by the computer system corresponding to applications), virtual objects (e.g., files or representations of other users generated by the computer system) not associated with or included in the virtual environment and/or virtual content, and/or real objects (e.g., pass-through objects representing real objects in the physical environment around the user that are visible such that they are displayed via the display generation component and/or a visible via a transparent or translucent component of the display generation component because the computer system does not obscure/prevent visibility of them through the display generation component). In some embodiments, at a low level of immersion (e.g., a first level of immersion), the background, virtual and/or real objects are displayed in an unobscured manner. For example, a virtual environment with a low level of immersion is optionally displayed concurrently with the background content, which is optionally displayed with full brightness, color, and/or translucency. In some embodiments, at a higher level of immersion (e.g., a second level of immersion higher than the first level of immersion), the background, virtual and/or real objects are displayed in an obscured manner (e.g., dimmed, blurred, or removed from display). For example, a respective virtual environment with a high level of immersion is displayed without concurrently displaying the background content (e.g., in a full screen or fully immersive mode). As another example, a virtual environment displayed with a medium level of immersion is displayed concurrently with darkened, blurred, or otherwise de-emphasized background content. In some embodiments, the visual characteristics of the background objects vary among the background objects. For example, at a particular immersion level, one or more first background objects are visually de-emphasized (e.g., dimmed, blurred, and/or displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed. In some embodiments, a null or zero level of immersion corresponds to the virtual environment ceasing to be displayed and instead a representation of a physical environment is displayed (optionally with one or more virtual objects such as application, windows, or virtual three-dimensional objects) without the representation of the physical environment being obscured by the virtual environment. Adjusting the level of immersion using a physical input element provides for quick and efficient method of adjusting immersion, which enhances the operability of the computer system and makes the user-device interface more efficient.
Viewpoint-locked virtual object: A virtual object is viewpoint-locked when a computer system displays the virtual object at the same location and/or position in the viewpoint of the user, even as the viewpoint of the user shifts (e.g., changes). In embodiments where the computer system is a head-mounted device, the viewpoint of the user is locked to the forward facing direction of the user's head (e.g., the viewpoint of the user is at least a portion of the field-of-view of the user when the user is looking straight ahead); thus, the viewpoint of the user remains fixed even as the user's gaze is shifted, without moving the user's head. In embodiments where the computer system has a display generation component (e.g., a display screen) that can be repositioned with respect to the user's head, the viewpoint of the user is the augmented reality view that is being presented to the user on a display generation component of the computer system. For example, a viewpoint-locked virtual object that is displayed in the upper left corner of the viewpoint of the user, when the viewpoint of the user is in a first orientation (e.g., with the user's head facing north) continues to be displayed in the upper left corner of the viewpoint of the user, even as the viewpoint of the user changes to a second orientation (e.g., with the user's head facing west). In other words, the location and/or position at which the viewpoint-locked virtual object is displayed in the viewpoint of the user is independent of the user's position and/or orientation in the physical environment. In embodiments in which the computer system is a head-mounted device, the viewpoint of the user is locked to the orientation of the user's head, such that the virtual object is also referred to as a “head-locked virtual object.”
Environment-locked virtual object: A virtual object is environment-locked (alternatively, “world-locked”) when a computer system displays the virtual object at a location and/or position in the viewpoint of the user that is based on (e.g., selected in reference to and/or anchored to) a location and/or object in the three-dimensional environment (e.g., a physical environment or a virtual environment). As the viewpoint of the user shifts, the location and/or object in the environment relative to the viewpoint of the user changes, which results in the environment-locked virtual object being displayed at a different location and/or position in the viewpoint of the user. For example, an environment-locked virtual object that is locked onto a tree that is immediately in front of a user is displayed at the center of the viewpoint of the user. When the viewpoint of the user shifts to the right (e.g., the user's head is turned to the right) so that the tree is now left-of-center in the viewpoint of the user (e.g., the tree's position in the viewpoint of the user shifts), the environment-locked virtual object that is locked onto the tree is displayed left-of-center in the viewpoint of the user. In other words, the location and/or position at which the environment-locked virtual object is displayed in the viewpoint of the user is dependent on the position and/or orientation of the location and/or object in the environment onto which the virtual object is locked. In some embodiments, the computer system uses a stationary frame of reference (e.g., a coordinate system that is anchored to a fixed location and/or object in the physical environment) in order to determine the position at which to display an environment-locked virtual object in the viewpoint of the user. An environment-locked virtual object can be locked to a stationary part of the environment (e.g., a floor, wall, table, or other stationary object) or can be locked to a moveable part of the environment (e.g., a vehicle, animal, person, or even a representation of portion of the users body that moves independently of a viewpoint of the user, such as a user's hand, wrist, arm, or foot) so that the virtual object is moved as the viewpoint or the portion of the environment moves to maintain a fixed relationship between the virtual object and the portion of the environment.
In some embodiments a virtual object that is environment-locked or viewpoint-locked exhibits lazy follow behavior which reduces or delays motion of the environment-locked or viewpoint-locked virtual object relative to movement of a point of reference which the virtual object is following. In some embodiments, when exhibiting lazy follow behavior the computer system intentionally delays movement of the virtual object when detecting movement of a point of reference (e.g., a portion of the environment, the viewpoint, or a point that is fixed relative to the viewpoint, such as a point that is between 5-300 cm from the viewpoint) which the virtual object is following. For example, when the point of reference (e.g., the portion of the environment or the viewpoint) moves with a first speed, the virtual object is moved by the device to remain locked to the point of reference but moves with a second speed that is slower than the first speed (e.g., until the point of reference stops moving or slows down, at which point the virtual object starts to catch up to the point of reference). In some embodiments, when a virtual object exhibits lazy follow behavior the device ignores small amounts of movement of the point of reference (e.g., ignoring movement of the point of reference that is below a threshold amount of movement such as movement by 0-5 degrees or movement by 0-50 cm). For example, when the point of reference (e.g., the portion of the environment or the viewpoint to which the virtual object is locked) moves by a first amount, a distance between the point of reference and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a viewpoint or portion of the environment that is different from the point of reference to which the virtual object is locked) and when the point of reference (e.g., the portion of the environment or the viewpoint to which the virtual object is locked) moves by a second amount that is greater than the first amount, a distance between the point of reference and the virtual object initially increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a viewpoint or portion of the environment that is different from the point of reference to which the virtual object is locked) and then decreases as the amount of movement of the point of reference increases above a threshold (e.g., a “lazy follow” threshold) because the virtual object is moved by the computer system to maintain a fixed or substantially fixed position relative to the point of reference. In some embodiments the virtual object maintaining a substantially fixed position relative to the point of reference includes the virtual object being displayed within a threshold distance (e.g., 1, 2, 3, 5, 15, 20, 50 cm) of the point of reference in one or more dimensions (e.g., up/down, left/right, and/or forward/backward relative to the position of the point of reference).
Hardware: There are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head-mounted systems, projection-based systems, heads-up displays (HUDs), vehicle windshields having integrated display capability, windows having integrated display capability, displays formed as lenses designed to be placed on a person's eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smartphones, tablets, and desktop/laptop computers. A head-mounted system may include speakers and/or other audio output devices integrated into the head-mounted system for providing audio output. A head-mounted system may have one or more speaker(s) and an integrated opaque display. Alternatively, a head-mounted system may be configured to accept an external opaque display (e.g., a smartphone). The head-mounted system may incorporate one or more imaging sensors to capture images or video of the physical environment, and/or one or more microphones to capture audio of the physical environment. Rather than an opaque display, a head-mounted system may have a transparent or translucent display. The transparent or translucent display may have a medium through which light representative of images is directed to a person's eyes. The display may utilize digital light projection, OLEDs, LEDs, uLEDs, liquid crystal on silicon, laser scanning light source, or any combination of these technologies. The medium may be an optical waveguide, a hologram medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to become opaque selectively. Projection-based systems may employ retinal projection technology that projects graphical images onto a person's retina. Projection systems also may be configured to project virtual objects into the physical environment, for example, as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate a XR experience for the user. In some embodiments, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to
In some embodiments, the display generation component 120 is configured to provide the XR experience (e.g., at least a visual component of the XR experience) to the user. In some embodiments, the display generation component 120 includes a suitable combination of software, firmware, and/or hardware. The display generation component 120 is described in greater detail below with respect to
According to some embodiments, the display generation component 120 provides a XR experience to the user while the user is virtually and/or physically present within the scene 105.
In some embodiments, the display generation component is worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, the display generation component 120 includes one or more XR displays provided to display the XR content. For example, in various embodiments, the display generation component 120 encloses the field-of-view of the user. In some embodiments, the display generation component 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. In some embodiments, the display generation component 120 is a XR chamber, enclosure, or room configured to present XR content in which the user does not wear or hold the display generation component 120. Many user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) could be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions that happen in a space in front of a handheld or tripod mounted device could similarly be implemented with an HMD where the interactions happen in a space in front of the HMD and the responses of the XR content are displayed via the HMD. Similarly, a user interface showing interactions with XR content triggered based on movement of a handheld or tripod mounted device relative to the physical environment (e.g., the scene 105 or a part of the user's body (e.g., the user's eye(s), head, or hand)) could similarly be implemented with an HMD where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a part of the user's body (e.g., the user's eye(s), head, or hand)).
While pertinent features of the operating environment 100 are shown in
In at least one example, the band assembly 1-106 can include a first band 1-116 configured to wrap around the rear side of a user's head and a second band 1-117 configured to extend over the top of a user's head. The second strap can extend between first and second electronic straps 1-105a, 1-105b of the electronic strap assembly 1-104 as shown. The strap assembly 1-104 and the band assembly 1-106 can be part of a securement mechanism extending rearward from the display unit 1-102 and configured to hold the display unit 1-102 against a face of a user.
In at least one example, the securement mechanism includes a first electronic strap 1-105a including a first proximal end 1-134 coupled to the display unit 1-102, for example a housing 1-150 of the display unit 1-102, and a first distal end 1-136 opposite the first proximal end 1-134. The securement mechanism can also include a second electronic strap 1-105b including a second proximal end 1-138 coupled to the housing 1-150 of the display unit 1-102 and a second distal end 1-140 opposite the second proximal end 1-138. The securement mechanism can also include the first band 1-116 including a first end 1-142 coupled to the first distal end 1-136 and a second end 1-144 coupled to the second distal end 1-140 and the second band 1-117 extending between the first electronic strap 1-105a and the second electronic strap 1-105b. The straps 1-105a-b and band 1-116 can be coupled via connection mechanisms or assemblies 1-114. In at least one example, the second band 1-117 includes a first end 1-146 coupled to the first electronic strap 1-105a between the first proximal end 1-134 and the first distal end 1-136 and a second end 1-148 coupled to the second electronic strap 1-105b between the second proximal end 1-138 and the second distal end 1-140.
In at least one example, the first and second electronic straps 1-105a-b include plastic, metal, or other structural materials forming the shape the substantially rigid straps 1-105a-b. In at least one example, the first and second bands 1-116, 1-117 are formed of elastic, flexible materials including woven textiles, rubbers, and the like. The first and second bands 1-116, 1-117 can be flexible to conform to the shape of the user′ head when donning the HMD 1-100.
In at least one example, one or more of the first and second electronic straps 1-105a-b can define internal strap volumes and include one or more electronic components disposed in the internal strap volumes. In one example, as shown in
In at least one example, the housing 1-150 defines a first, front-facing opening 1-152. The front-facing opening is labeled in dotted lines at 1-152 in
In at least one example, the housing 1-150 can define a first aperture 1-126 between the first and second openings 1-152, 1-154 and a second aperture 1-130 between the first and second openings 1-152, 1-154. The HMD 1-100 can also include a first button 1-128 disposed in the first aperture 1-126 and a second button 1-132 disposed in the second aperture 1-130. The first and second buttons 1-128, 1-132 can be depressible through the respective apertures 1-126, 1-130. In at least one example, the first button 1-126 and/or second button 1-132 can be twistable dials as well as depressible buttons. In at least one example, the first button 1-128 is a depressible and twistable dial button and the second button 1-132 is a depressible button.
In at least one example, referring to both
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In addition, the HMD 1-200 can include a light seal 1-210 configured to be removably coupled to the display unit 1-202. The HMD 1-200 can also include lenses 1-218 which can be removably coupled to the display unit 1-202, for example over first and second display assemblies including display screens. The lenses 1-218 can include customized prescription lenses configured for corrective vision. As noted, each part shown in the exploded view of
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the display unit 1-306 can also include a motor assembly 1-362 configured as an adjustment mechanism for adjusting the positions of the display screens 1-322a-b of the display assembly 1-320 relative to the frame 1-350. In at least one example, the display assembly 1-320 is mechanically coupled to the motor assembly 1-362, with at least one motor for each display screen 1-322a-b, such that the motors can translate the display screens 1-322a-b to match an interpupillary distance of the user's eyes.
In at least one example, the display unit 1-306 can include a dial or button 1-328 depressible relative to the frame 1-350 and accessible to the user outside the frame 1-350. The button 1-328 can be electronically connected to the motor assembly 1-362 via a controller such that the button 1-328 can be manipulated by the user to cause the motors of the motor assembly 1-362 to adjust the positions of the display screens 1-322a-b.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
The various parts, systems, and assemblies shown in the exploded view of
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, as shown in
In at least one example, the shroud 3-104 can include a transparent or semi-transparent material through which the display assembly 3-108 projects light. In one example, the shroud 3-104 can include one or more opaque portions, for example opaque ink-printed portions or other opaque film portions on the rear surface of the shroud 3-104. The rear surface can be the surface of the shroud 3-104 facing the user's eyes when the HMD device is donned. In at least one example, opaque portions can be on the front surface of the shroud 3-104 opposite the rear surface. In at least one example, the opaque portion or portions of the shroud 3-104 can include perimeter portions visually hiding any components around an outside perimeter of the display screen of the display assembly 3-108. In this way, the opaque portions of the shroud hide any other components, including electronic components, structural components, and so forth, of the HMD device that would otherwise be visible through the transparent or semi-transparent cover 3-102 and/or shroud 3-104.
In at least one example, the shroud 3-104 can define one or more apertures transparent portions 3-120 through which sensors can send and receive signals. In one example, the portions 3-120 are apertures through which the sensors can extend or send and receive signals. In one example, the portions 3-120 are transparent portions, or portions more transparent than surrounding semi-transparent or opaque portions of the shroud, through which sensors can send and receive signals through the shroud and through the transparent cover 3-102. In one example, the sensors can include cameras, IR sensors, LUX sensors, or any other visual or non-visual environmental sensors of the HMD device.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the transparent cover 6-104 can define a front, external surface of the HMD device 6-100 and the sensor system 6-102, including the various sensors and components thereof, can be disposed behind the cover 6-104 in the Y-axis/direction. The cover 6-104 can be transparent or semi-transparent to allow light to pass through the cover 6-104, both light detected by the sensor system 6-102 and light emitted thereby.
As noted elsewhere herein, the HMD device 6-100 can include one or more controllers including processors for electrically coupling the various sensors and emitters of the sensor system 6-102 with one or more mother boards, processing units, and other electronic devices such as display screens and the like. In addition, as will be shown in more detail below with reference to other figures, the various sensors, emitters, and other components of the sensor system 6-102 can be coupled to various structural frame members, brackets, and so forth of the HMD device 6-100 not shown in
In at least one example, the device can include one or more controllers having processors configured to execute instructions stored on memory components electrically coupled to the processors. The instructions can include, or cause the processor to execute, one or more algorithms for self-correcting angles and positions of the various cameras described herein overtime with use as the initial positions, angles, or orientations of the cameras get bumped or deformed due to unintended drop events or other events.
In at least one example, the sensor system 6-102 can include one or more scene cameras 6-106. The system 6-102 can include two scene cameras 6-102 disposed on either side of the nasal bridge or arch of the HMD device 6-100 such that each of the two cameras 6-106 correspond generally in position with left and right eyes of the user behind the cover 6-103. In at least one example, the scene cameras 6-106 are oriented generally forward in the Y-direction to capture images in front of the user during use of the HMD 6-100. In at least one example, the scene cameras are color cameras and provide images and content for MR video pass through to the display screens facing the user's eyes when using the HMD device 6-100. The scene cameras 6-106 can also be used for environment and object reconstruction.
In at least one example, the sensor system 6-102 can include a first depth sensor 6-108 pointed generally forward in the Y-direction. In at least one example, the first depth sensor 6-108 can be used for environment and object reconstruction as well as user hand and body tracking. In at least one example, the sensor system 6-102 can include a second depth sensor 6-110 disposed centrally along the width (e.g., along the X-axis) of the HMD device 6-100. For example, the second depth sensor 6-110 can be disposed above the central nasal bridge or accommodating features over the nose of the user when donning the HMD 6-100. In at least one example, the second depth sensor 6-110 can be used for environment and object reconstruction as well as hand and body tracking. In at least one example, the second depth sensor can include a LIDAR sensor.
In at least one example, the sensor system 6-102 can include a depth projector 6-112 facing generally forward to project electromagnetic waves, for example in the form of a predetermined pattern of light dots, out into and within a field of view of the user and/or the scene cameras 6-106 or a field of view including and beyond the field of view of the user and/or scene cameras 6-106. In at least one example, the depth projector can project electromagnetic waves of light in the form of a dotted light pattern to be reflected off objects and back into the depth sensors noted above, including the depth sensors 6-108, 6-110. In at least one example, the depth projector 6-112 can be used for environment and object reconstruction as well as hand and body tracking.
In at least one example, the sensor system 6-102 can include downward facing cameras 6-114 with a field of view pointed generally downward relative to the HDM device 6-100 in the Z-axis. In at least one example, the downward cameras 6-114 can be disposed on left and right sides of the HMD device 6-100 as shown and used for hand and body tracking, headset tracking, and facial avatar detection and creation for display a user avatar on the forward facing display screen of the HMD device 6-100 described elsewhere herein. The downward cameras 6-114, for example, can be used to capture facial expressions and movements for the face of the user below the HMD device 6-100, including the cheeks, mouth, and chin.
In at least one example, the sensor system 6-102 can include jaw cameras 6-116. In at least one example, the jaw cameras 6-116 can be disposed on left and right sides of the HMD device 6-100 as shown and used for hand and body tracking, headset tracking, and facial avatar detection and creation for display a user avatar on the forward facing display screen of the HMD device 6-100 described elsewhere herein. The jaw cameras 6-116, for example, can be used to capture facial expressions and movements for the face of the user below the HMD device 6-100, including the user's jaw, cheeks, mouth, and chin. for hand and body tracking, headset tracking, and facial avatar
In at least one example, the sensor system 6-102 can include side cameras 6-118. The side cameras 6-118 can be oriented to capture side views left and right in the X-axis or direction relative to the HMD device 6-100. In at least one example, the side cameras 6-118 can be used for hand and body tracking, headset tracking, and facial avatar detection and re-creation.
In at least one example, the sensor system 6-102 can include a plurality of eye tracking and gaze tracking sensors for determining an identity, status, and gaze direction of a user's eyes during and/or before use. In at least one example, the eye/gaze tracking sensors can include nasal eye cameras 6-120 disposed on either side of the user's nose and adjacent the user's nose when donning the HMD device 6-100. The eye/gaze sensors can also include bottom eye cameras 6-122 disposed below respective user eyes for capturing images of the eyes for facial avatar detection and creation, gaze tracking, and iris identification functions.
In at least one example, the sensor system 6-102 can include infrared illuminators 6-124 pointed outward from the HMD device 6-100 to illuminate the external environment and any object therein with IR light for IR detection with one or more IR sensors of the sensor system 6-102. In at least one example, the sensor system 6-102 can include a flicker sensor 6-126 and an ambient light sensor 6-128. In at least one example, the flicker sensor 6-126 can detect overhead light refresh rates to avoid display flicker. In one example, the infrared illuminators 6-124 can include light emitting diodes and can be used especially for low light environments for illuminating user hands and other objects in low light for detection by infrared sensors of the sensor system 6-102.
In at least one example, multiple sensors, including the scene cameras 6-106, the downward cameras 6-114, the jaw cameras 6-116, the side cameras 6-118, the depth projector 6-112, and the depth sensors 6-108, 6-110 can be used in combination with an electrically coupled controller to combine depth data with camera data for hand tracking and for size determination for better hand tracking and object recognition and tracking functions of the HMD device 6-100. In at least one example, the downward cameras 6-114, jaw cameras 6-116, and side cameras 6-118 described above and shown in
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In some examples, the shroud 6-204 includes a transparent portion 6-205 and an opaque portion 6-207, as described above and elsewhere herein. In at least one example, the opaque portion 6-207 of the shroud 6-204 can define one or more transparent regions 6-209 through which the sensors 6-203 of the sensor system 6-202 can send and receive signals. In the illustrated example, the sensors 6-203 of the sensor system 6-202 sending and receiving signals through the shroud 6-204, or more specifically through the transparent regions 6-209 of the (or defined by) the opaque portion 6-207 of the shroud 6-204 can include the same or similar sensors as those shown in the example of
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the various sensors of the sensor system 6-302 are coupled to the brackets 6-336, 6-338. In at least one example, the scene cameras 6-306 include tight tolerances of angles relative to one another. For example, the tolerance of mounting angles between the two scene cameras 6-306 can be 0.5 degrees or less, for example 0.3 degrees or less. In order to achieve and maintain such a tight tolerance, in one example, the scene cameras 6-306 can be mounted to the bracket 6-338 and not the shroud. The bracket can include cantilevered arms on which the scene cameras 6-306 and other sensors of the sensor system 6-302 can be mounted to remain un-deformed in position and orientation in the case of a drop event by a user resulting in any deformation of the other bracket 6-226, housing 6-330, and/or shroud.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the first and second optical modules 11.1.1-104a-b can include respective display screens configured to project light toward the user's eyes when donning the HMD 11.1.1-100. In at least one example, the user can manipulate (e.g., depress and/or rotate) the button 11.1.1-114 to activate a positional adjustment of the optical modules 11.1.1-104a-b to match the inter-pupillary distance of the user's eyes. The optical modules 11.1.1-104a-b can also include one or more cameras or other sensors/sensor systems for imaging and measuring the IPD of the user such that the optical modules 11.1.1-104a-b can be adjusted to match the IPD.
In one example, the user can manipulate the button 11.1.1-114 to cause an automatic positional adjustment of the first and second optical modules 11.1.1-104a-b. In one example, the user can manipulate the button 11.1.1-114 to cause a manual adjustment such that the optical modules 11.1.1-104a-b move further or closer away, for example when the user rotates the button 11.1.1-114 one way or the other, until the user visually matches her/his own IPD. In one example, the manual adjustment is electronically communicated via one or more circuits and power for the movements of the optical modules 11.1.1-104a-b via the motors 11.1.1-110a-b is provided by an electrical power source. In one example, the adjustment and movement of the optical modules 11.1.1-104a-b via a manipulation of the button 11.1.1-114 is mechanically actuated via the movement of the button 11.1.1-114.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
The mounting bracket 11.1.2-108 can include a middle or central portion 11.1.2-109 coupled to the inner frame 11.1.2-104. In some examples, the middle or central portion 11.1.2-109 may not be the geometric middle or center of the bracket 11.1.2-108. Rather, the middle/central portion 11.1.2-109 can be disposed between first and second cantilevered extension arms extending away from the middle portion 11.1.2-109. In at least one example, the mounting bracket 108 includes a first cantilever arm 11.1.2-112 and a second cantilever arm 11.1.2-114 extending away from the middle portion 11.1.2-109 of the mount bracket 11.1.2-108 coupled to the inner frame 11.1.2-104.
As shown in
The first cantilever arm 11.1.2-112 can extend away from the middle portion 11.1.2-109 of the mounting bracket 11.1.2-108 in a first direction and the second cantilever arm 11.1.2-114 can extend away from the middle portion 11.1.2-109 of the mounting bracket 11.1.2-10 in a second direction opposite the first direction. The first and second cantilever arms 11.1.2-112, 11.1.2-114 are referred to as “cantilevered” or “cantilever” arms because each arm 11.1.2-112, 11.1.2-114, includes a distal free end 11.1.2-116, 11.1.2-118, respectively, which are free of affixation from the inner and outer frames 11.1.2-102, 11.1.2-104. In this way, the arms 11.1.2-112, 11.1.2-114 are cantilevered from the middle portion 11.1.2-109, which can be connected to the inner frame 11.1.2-104, with distal ends 11.1.2-102, 11.1.2-104 unattached.
In at least one example, the HMD 11.1.2-100 can include one or more components coupled to the mounting bracket 11.1.2-108. In one example, the components include a plurality of sensors 11.1.2-110a-f. Each sensor of the plurality of sensors 11.1.2-110a-f can include various types of sensors, including cameras, IR sensors, and so forth. In some examples, one or more of the sensors 11.1.2-110a-f can be used for object recognition in three-dimensional space such that it is important to maintain a precise relative position of two or more of the plurality of sensors 11.1.2-110a-f. The cantilevered nature of the mounting bracket 11.1.2-108 can protect the sensors 11.1.2-110a-f from damage and altered positioning in the case of accidental drops by the user. Because the sensors 11.1.2-110a-f are cantilevered on the arms 11.1.2-112, 11.1.2-114 of the mounting bracket 11.1.2-108, stresses and deformations of the inner and/or outer frames 11.1.2-104, 11.1.2-102 are not transferred to the cantilevered arms 11.1.2-112, 11.1.2-114 and thus do not affect the relative positioning of the sensors 11.1.2-110a-f coupled/mounted to the mounting bracket 11.1.2-108.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the optical module 11.3.2-100 can include an optical frame or housing 11.3.2-102, which can also be referred to as a barrel or optical module barrel. The optical module 11.3.2-100 can also include a display 11.3.2-104, including a display screen or multiple display screens, coupled to the housing 11.3.2-102. The display 11.3.2-104 can be coupled to the housing 11.3.2-102 such that the display 11.3.2-104 is configured to project light toward the eye of a user when the HMD of which the display module 11.3.2-100 is a part is donned during use. In at least one example, the housing 11.3.2-102 can surround the display 11.3.2-104 and provide connection features for coupling other components of optical modules described herein.
In one example, the optical module 11.3.2-100 can include one or more cameras 11.3.2-106 coupled to the housing 11.3.2-102. The camera 11.3.2-106 can be positioned relative to the display 11.3.2-104 and housing 11.3.2-102 such that the camera 11.3.2-106 is configured to capture one or more images of the user's eye during use. In at least one example, the optical module 11.3.2-100 can also include a light strip 11.3.2-108 surrounding the display 11.3.2-104. In one example, the light strip 11.3.2-108 is disposed between the display 11.3.2-104 and the camera 11.3.2-106. The light strip 11.3.2-108 can include a plurality of lights 11.3.2-110. The plurality of lights can include one or more light emitting diodes (LEDs) or other lights configured to project light toward the user's eye when the HMD is donned. The individual lights 11.3.2-110 of the light strip 11.3.2-108 can be spaced about the strip 11.3.2-108 and thus spaced about the display 11.3.2-104 uniformly or non-uniformly at various locations on the strip 11.3.2-108 and around the display 11.3.2-104.
In at least one example, the housing 11.3.2-102 defines a viewing opening 11.3.2-101 through which the user can view the display 11.3.2-104 when the HMD device is donned. In at least one example, the LEDs are configured and arranged to emit light through the viewing opening 11.3.2-101 and onto the user's eye. In one example, the camera 11.3.2-106 is configured to capture one or more images of the user's eye through the viewing opening 11.3.2-101.
As noted above, each of the components and features of the optical module 11.3.2-100 shown in
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In at least one example, the optical module 11.3.2-200 can also include a lens 11.3.2-216 coupled to the housing 11.3.2-202 and disposed between the display assembly 11.3.2-204 and the user's eyes when the HMD is donned. The lens 11.3.2-216 can be configured to direct light from the display assembly 11.3.2-204 to the user's eye. In at least one example, the lens 11.3.2-216 can be a part of a lens assembly including a corrective lens removably attached to the optical module 11.3.2-200. In at least one example, the lens 11.3.2-216 is disposed over the light strip 11.3.2-208 and the one or more eye-tracking cameras 11.3.2-206 such that the camera 11.3.2-206 is configured to capture images of the user's eye through the lens 11.3.2-216 and the light strip 11.3.2-208 includes lights configured to project light through the lens 11.3.2-216 to the users' eye during use.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In some embodiments, the one or more communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and/or the like.
The memory 220 includes high-speed random-access memory, such as dynamic random-access memory (DRAM), static random-access memory (SRAM), double-data-rate random-access memory (DDR RAM), or other random-access solid-state memory devices. In some embodiments, the memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 220 optionally includes one or more storage devices remotely located from the one or more processing units 202. The memory 220 comprises a non-transitory computer readable storage medium. In some embodiments, the memory 220 or the non-transitory computer readable storage medium of the memory 220 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 230 and a XR experience module 240.
The operating system 230 includes instructions for handling various basic system services and for performing hardware dependent tasks. In some embodiments, the XR experience module 240 is configured to manage and coordinate one or more XR experiences for one or more users (e.g., a single XR experience for one or more users, or multiple XR experiences for respective groups of one or more users). To that end, in various embodiments, the XR experience module 240 includes a data obtaining unit 241, a tracking unit 242, a coordination unit 246, and a data transmitting unit 248.
In some embodiments, the data obtaining unit 241 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the display generation component 120 of
In some embodiments, the tracking unit 242 is configured to map the scene 105 and to track the position/location of at least the display generation component 120 with respect to the scene 105 of
In some embodiments, the coordination unit 246 is configured to manage and coordinate the XR experience presented to the user by the display generation component 120, and optionally, by one or more of the output devices 155 and/or peripheral devices 195. To that end, in various embodiments, the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some embodiments, the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the display generation component 120, and optionally, to one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefor.
Although the data obtaining unit 241, the tracking unit 242 (e.g., including the eye tracking unit 243 and the hand tracking unit 244), the coordination unit 246, and the data transmitting unit 248 are shown as residing on a single device (e.g., the controller 110), it should be understood that in other embodiments, any combination of the data obtaining unit 241, the tracking unit 242 (e.g., including the eye tracking unit 243 and the hand tracking unit 244), the coordination unit 246, and the data transmitting unit 248 may be located in separate computing devices.
Moreover,
In some embodiments, the one or more communication buses 304 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices and sensors 306 include at least one of an inertial measurement unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
In some embodiments, the one or more XR displays 312 are configured to provide the XR experience to the user. n some embodiments, the one or more XR displays 312 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid-crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light-emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electro-mechanical system (MEMS), and/or the like display types. In some embodiments, the one or more XR displays 312 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. For example, the display generation component 120 (e.g., HMD) includes a single XR display. In another example, the display generation component 120 includes a XR display for each eye of the user. In some embodiments, the one or more XR displays 312 are capable of presenting MR and VR content. In some embodiments, the one or more XR displays 312 are capable of presenting MR or VR content.
In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component 120 (e.g., HMD) was not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.
The memory 320 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some embodiments, the memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 320 optionally includes one or more storage devices remotely located from the one or more processing units 302. The memory 320 comprises a non-transitory computer readable storage medium. In some embodiments, the memory 320 or the non-transitory computer readable storage medium of the memory 320 stores the following programs, modules and data structures, or a subset thereof including an optional operating system 330 and a XR presentation module 340.
The operating system 330 includes instructions for handling various basic system services and for performing hardware dependent tasks. In some embodiments, the XR presentation module 340 is configured to present XR content to the user via the one or more XR displays 312. To that end, in various embodiments, the XR presentation module 340 includes a data obtaining unit 342, a XR presenting unit 344, a XR map generating unit 346, and a data transmitting unit 348.
In some embodiments, the data obtaining unit 342 is configured to obtain data (e.g., presentation data, interaction data, sensor data, location data, etc.) from at least the controller 110 of
In some embodiments, the XR presenting unit 344 is configured to present XR content via the one or more XR displays 312. To that end, in various embodiments, the XR presenting unit 344 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some embodiments, the XR map generating unit 346 is configured to generate a XR map (e.g., a 3D map of the mixed reality scene or a map of the physical environment into which computer-generated objects can be placed to generate the extended reality) based on media content data. To that end, in various embodiments, the XR map generating unit 346 includes instructions and/or logic therefor, and heuristics and metadata therefor.
In some embodiments, the data transmitting unit 348 is configured to transmit data (e.g., presentation data, location data, etc.) to at least the controller 110, and optionally one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data transmitting unit 348 includes instructions and/or logic therefor, and heuristics and metadata therefor.
Although the data obtaining unit 342, the XR presenting unit 344, the XR map generating unit 346, and the data transmitting unit 348 are shown as residing on a single device (e.g., the display generation component 120 of
Moreover,
In some embodiments, the hand tracking device 140 includes image sensors 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras, etc.) that capture three-dimensional scene information that includes at least a hand 406 of a human user. The image sensors 404 capture the hand images with sufficient resolution to enable the fingers and their respective positions to be distinguished. The image sensors 404 typically capture images of other parts of the user's body, as well, or possibly all of the body, and may have either zoom capabilities or a dedicated sensor with enhanced magnification to capture images of the hand with the desired resolution. In some embodiments, the image sensors 404 also capture 2D color video images of the hand 406 and other elements of the scene. In some embodiments, the image sensors 404 are used in conjunction with other image sensors to capture the physical environment of the scene 105, or serve as the image sensors that capture the physical environments of the scene 105. In some embodiments, the image sensors 404 are positioned relative to the user or the user's environment in a way that a field of view of the image sensors or a portion thereof is used to define an interaction space in which hand movement captured by the image sensors are treated as inputs to the controller 110.
In some embodiments, the image sensors 404 output a sequence of frames containing 3D map data (and possibly color image data, as well) to the controller 110, which extracts high-level information from the map data. This high-level information is typically provided via an Application Program Interface (API) to an application running on the controller, which drives the display generation component 120 accordingly. For example, the user may interact with software running on the controller 110 by moving his hand 406 and changing his hand posture.
In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.
In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and finger tips.
The software may also analyze the trajectory of the hands and/or fingers over multiple frames in the sequence in order to identify gestures. The pose estimation functions described herein may be interleaved with motion tracking functions, so that patch-based pose estimation is performed only once in every two (or more) frames, while tracking is used to find changes in the pose that occur over the remaining frames. The pose, motion, and gesture information are provided via the above-mentioned API to an application program running on the controller 110. This program may, for example, move and modify images presented on the display generation component 120, or perform other functions, in response to the pose and/or gesture information.
In some embodiments, a gesture includes an air gesture. An air gesture is a gesture that is detected without the user touching (or independently of) an input element that is part of a device (e.g., computer system 101, one or more input device 125, and/or hand tracking device 140) and is based on detected motion of a portion (e.g., the head, one or more arms, one or more hands, one or more fingers, and/or one or more legs) of the user's body through the air including motion of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), relative to another portion of the user's body (e.g., movement of a hand of the user relative to a shoulder of the user, movement of one hand of the user relative to another hand of the user, and/or movement of a finger of the user relative to another finger or portion of a hand of the user), and/or absolute motion of a portion of the user's body (e.g., a tap gesture that includes movement of a hand in a predetermined pose by a predetermined amount and/or speed, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user's body).
In some embodiments, input gestures used in the various examples and embodiments described herein include air gestures performed by movement of the user's finger(s) relative to other finger(s) (or part(s) of the user's hand) for interacting with an XR environment (e.g., a virtual or mixed-reality environment), in some embodiments. In some embodiments, an air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independently of an input element that is a part of the device) and is based on detected motion of a portion of the user's body through the air including motion of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), relative to another portion of the user's body (e.g., movement of a hand of the user relative to a shoulder of the user, movement of one hand of the user relative to another hand of the user, and/or movement of a finger of the user relative to another finger or portion of a hand of the user), and/or absolute motion of a portion of the user's body (e.g., a tap gesture that includes movement of a hand in a predetermined pose by a predetermined amount and/or speed, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user's body).
In some embodiments in which the input gesture is an air gesture (e.g., in the absence of physical contact with an input device that provides the computer system with information about which user interface element is the target of the user input, such as contact with a user interface element displayed on a touchscreen, or contact with a mouse or trackpad to move a cursor to the user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of the user input (e.g., for direct inputs, as described below). Thus, in implementations involving air gestures, the input gesture is, for example, detected attention (e.g., gaze) toward the user interface element in combination (e.g., concurrent) with movement of a user's finger(s) and/or hands to perform a pinch and/or tap input, as described in more detail below.
In some embodiments, input gestures that are directed to a user interface object are performed directly or indirectly with reference to a user interface object. For example, a user input is performed directly on the user interface object in performing the input gesture with the user's hand at a position that corresponds to the position of the user interface object in the three-dimensional environment (e.g., as determined based on a current viewpoint of the user). In some embodiments, the input gesture is performed indirectly on the user interface object in accordance with the user performing the input gesture while a position of the user's hand is not at the position that corresponds to the position of the user interface object in the three-dimensional environment while detecting the user's attention (e.g., gaze) on the user interface object. For example, for direct input gesture, the user is enabled to direct the user's input to the user interface object by initiating the gesture at, or near, a position corresponding to the displayed position of the user interface object (e.g., within 0.5 cm, 1 cm, 5 cm, or a distance between 0-5 cm, as measured from an outer edge of the option or a center portion of the option). For an indirect input gesture, the user is enabled to direct the user's input to the user interface object by paying attention to the user interface object (e.g., by gazing at the user interface object) and, while paying attention to the option, the user initiates the input gesture (e.g., at any position that is detectable by the computer system) (e.g., at a position that does not correspond to the displayed position of the user interface object).
In some embodiments, input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch inputs and tap inputs, for interacting with a virtual or mixed-reality environment, in some embodiments. For example, the pinch inputs and tap inputs described below are performed as air gestures.
In some embodiments, a pinch input is part of an air gesture that includes one or more of: a pinch gesture, a long pinch gesture, a pinch and drag gesture, or a double pinch gesture. For example, a pinch gesture that is an air gesture includes movement of two or more fingers of a hand to make contact with one another, that is, optionally, followed by an immediate (e.g., within 0-1 seconds) break in contact from each other. A long pinch gesture that is an air gesture includes movement of two or more fingers of a hand to make contact with one another for at least a threshold amount of time (e.g., at least 1 second), before detecting a break in contact with one another. For example, a long pinch gesture includes the user holding a pinch gesture (e.g., with the two or more fingers making contact), and the long pinch gesture continues until a break in contact between the two or more fingers is detected. In some embodiments, a double pinch gesture that is an air gesture comprises two (e.g., or more) pinch inputs (e.g., performed by the same hand) detected in immediate (e.g., within a predefined time period) succession of each other. For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between the two or more fingers), and performs a second pinch input within a predefined time period (e.g., within 1 second or within 2 seconds) after releasing the first pinch input.
In some embodiments, a pinch and drag gesture that is an air gesture includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) performed in conjunction with (e.g., followed by) a drag input that changes a position of the user's hand from a first position (e.g., a start position of the drag) to a second position (e.g., an end position of the drag). In some embodiments, the user maintains the pinch gesture while performing the drag input, and releases the pinch gesture (e.g., opens their two or more fingers) to end the drag gesture (e.g., at the second position). In some embodiments, the pinch input and the drag input are performed by the same hand (e.g., the user pinches two or more fingers to make contact with one another and moves the same hand to the second position in the air with the drag gesture). In some embodiments, the pinch input is performed by a first hand of the user and the drag input is performed by the second hand of the user (e.g., the user's second hand moves from the first position to the second position in the air while the user continues the pinch input with the user's first hand). In some embodiments, an input gesture that is an air gesture includes inputs (e.g., pinch and/or tap inputs) performed using both of the user's two hands. For example, the input gesture includes two (e.g., or more) pinch inputs performed in conjunction with (e.g., concurrently with, or within a predefined time period of) each other. For example, a first pinch gesture performed using a first hand of the user (e.g., a pinch input, a long pinch input, or a pinch and drag input), and, in conjunction with performing the pinch input using the first hand, performing a second pinch input using the other hand (e.g., the second hand of the user's two hands). In some embodiments, movement between the user's two hands (e.g., to increase and/or decrease a distance or relative orientation between the user's two hands).
In some embodiments, a tap input (e.g., directed to a user interface element) performed as an air gesture includes movement of a user's finger(s) toward the user interface element, movement of the user's hand toward the user interface element optionally with the user's finger(s) extended toward the user interface element, a downward motion of a user's finger (e.g., mimicking a mouse click motion or a tap on a touchscreen), or other predefined movement of the user's hand. In some embodiments a tap input that is performed as an air gesture is detected based on movement characteristics of the finger or hand performing the tap gesture movement of a finger or hand away from the viewpoint of the user and/or toward an object that is the target of the tap input followed by an end of the movement. In some embodiments the end of the movement is detected based on a change in movement characteristics of the finger or hand performing the tap gesture (e.g., an end of movement away from the viewpoint of the user and/or toward the object that is the target of the tap input, a reversal of direction of movement of the finger or hand, and/or a reversal of a direction of acceleration of movement of the finger or hand).
In some embodiments, attention of a user is determined to be directed to a portion of the three-dimensional environment based on detection of gaze directed to the portion of the three-dimensional environment (optionally, without requiring other conditions). In some embodiments, attention of a user is determined to be directed to a portion of the three-dimensional environment based on detection of gaze directed to the portion of the three-dimensional environment with one or more additional conditions such as requiring that gaze is directed to the portion of the three-dimensional environment for at least a threshold duration (e.g., a dwell duration) and/or requiring that the gaze is directed to the portion of the three-dimensional environment while the viewpoint of the user is within a distance threshold from the portion of the three-dimensional environment in order for the device to determine that attention of the user is directed to the portion of the three-dimensional environment, where if one of the additional conditions is not met, the device determines that attention is not directed to the portion of the three-dimensional environment toward which gaze is directed (e.g., until the one or more additional conditions are met).
In some embodiments, the detection of a ready state configuration of a user or a portion of a user is detected by the computer system. Detection of a ready state configuration of a hand is used by a computer system as an indication that the user is likely preparing to interact with the computer system using one or more air gesture inputs performed by the hand (e.g., a pinch, tap, pinch and drag, double pinch, long pinch, or other air gesture described herein). For example, the ready state of the hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape with a thumb and one or more fingers extended and spaced apart ready to make a pinch or grab gesture or a pre-tap with one or more fingers extended and palm facing away from the user), based on whether the hand is in a predetermined position relative to a viewpoint of the user (e.g., below the user's head and above the user's waist and extended out from the body by at least 15, 20, 25, 30, or 50 cm), and/or based on whether the hand has moved in a particular manner (e.g., moved toward a region in front of the user above the user's waist and below the user's head or moved away from the user's body or leg). In some embodiments, the ready state is used to determine whether interactive elements of the user interface respond to attention (e.g., gaze) inputs.
In scenarios where inputs are described with reference to air gestures, it should be understood that similar gestures could be detected using a hardware input device that is attached to or held by one or more hands of a user, where the position of the hardware input device in space can be tracked using optical tracking, one or more accelerometers, one or more gyroscopes, one or more magnetometers, and/or one or more inertial measurement units and the position and/or movement of the hardware input device is used in place of the position and/or movement of the one or more hands in the corresponding air gesture(s). In scenarios where inputs are described with reference to air gestures, it should be understood that similar gestures could be detected using a hardware input device that is attached to or held by one or more hands of a user, user inputs can be detected with controls contained in the hardware input device such as one or more touch-sensitive input elements, one or more pressure-sensitive input elements, one or more buttons, one or more knobs, one or more dials, one or more joysticks, one or more hand or finger coverings that can detect a position or change in position of portions of a hand and/or fingers relative to each other, relative to the user's body, and/or relative to a physical environment of the user, and/or other hardware input device controls, wherein the user inputs with the controls contained in the hardware input device are used in place of hand and/or finger gestures such as air taps or air pinches in the corresponding air gesture(s). For example, a selection input that is described as being performed with an air tap or air pinch input could be alternatively detected with a button press, a tap on a touch-sensitive surface, a press on a pressure-sensitive surface, or other hardware input. As another example, a movement input that is described as being performed with an air pinch and drag could be alternatively detected based on an interaction with the hardware input control such as a button press and hold, a touch on a touch-sensitive surface, a press on a pressure-sensitive surface, or other hardware input that is followed by movement of the hardware input device (e.g., along with the hand with which the hardware input device is associated) through space. Similarly, a two-handed input that includes movement of the hands relative to each other could be performed with one air gesture and one hardware input device in the hand that is not performing the air gesture, two hardware input devices held in different hands, or two air gestures performed by different hands using various combinations of air gestures and/or the inputs detected by one or more hardware input devices that are described above.
In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or it may alternatively be provided on tangible, non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, the database 408 is likewise stored in a memory associated with the controller 110. Alternatively or additionally, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable digital signal processor (DSP). Although the controller 110 is shown in
In some embodiments, the display generation component 120 uses a display mechanism (e.g., left and right near-eye display panels) for displaying frames including left and right images in front of a user's eyes to thus provide 3D virtual views to the user. For example, a head-mounted display generation component may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user's eyes. In some embodiments, the display generation component may include or be coupled to one or more external video cameras that capture video of the user's environment for display. In some embodiments, a head-mounted display generation component may have a transparent or semi-transparent display through which a user may view the physical environment directly and display virtual objects on the transparent or semi-transparent display. In some embodiments, display generation component projects virtual objects into the physical environment. The virtual objects may be projected, for example, on a physical surface or as a holograph, so that an individual, using the system, observes the virtual objects superimposed over the physical environment. In such cases, separate display panels and image frames for the left and right eyes may not be necessary.
As shown in
In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the specific operating environment 100, for example the 3D geometric relationship and parameters of the LEDs, cameras, hot mirrors (if present), eye lenses, and display screen. The device-specific calibration process may be performed at the factory or another facility prior to delivery of the AR/VR equipment to the end user. The device-specific calibration process may be an automated calibration process or a manual calibration process. A user-specific calibration process may include an estimation of a specific user's eye parameters, for example the pupil location, fovea location, optical axis, visual axis, eye spacing, etc. Once the device-specific and user-specific parameters are determined for the eye tracking device 130, images captured by the eye tracking cameras can be processed using a glint-assisted method to determine the current visual axis and point of gaze of the user with respect to the display, in some embodiments.
As shown in
In some embodiments, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. The controller 110 uses gaze tracking input 542 from the eye tracking cameras 540 for various purposes, for example in processing the frames 562 for display. The controller 110 optionally estimates the user's point of gaze on the display 510 based on the gaze tracking input 542 obtained from the eye tracking cameras 540 using the glint-assisted methods or other suitable methods. The point of gaze estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.
The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environments of the XR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.
In some embodiments, the eye tracking device is part of a head-mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens(es) 520), eye tracking cameras (e.g., eye tracking camera(s) 540), and light sources (e.g., illumination sources 530 (e.g., IR or NIR LEDs)), mounted in a wearable housing. The light sources emit light (e.g., IR or NIR light) towards the user's eye(s) 592. In some embodiments, the light sources may be arranged in rings or circles around each of the lenses as shown in
In some embodiments, the display 510 emits light in the visible light range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the location and angle of eye tracking camera(s) 540 is given by way of example, and is not intended to be limiting. In some embodiments, a single eye tracking camera 540 is located on each side of the user's face. In some embodiments, two or more NIR cameras 540 may be used on each side of the user's face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some embodiments, a camera 540 that operates at one wavelength (e.g., 850 nm) and a camera 540 that operates at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.
Embodiments of the gaze tracking system as illustrated in
As shown in
At 610, for the current captured images, if the tracking state is YES, then the method proceeds to element 640. At 610, if the tracking state is NO, then as indicated at 620 the images are analyzed to detect the user's pupils and glints in the images. At 630, if the pupils and glints are successfully detected, then the method proceeds to element 640. Otherwise, the method returns to element 610 to process next images of the user's eyes.
At 640, if proceeding from element 610, the current frames are analyzed to track the pupils and glints based in part on prior information from the previous frames. At 640, if proceeding from element 630, the tracking state is initialized based on the detected pupils and glints in the current frames. Results of processing at element 640 are checked to verify that the results of tracking or detection can be trusted. For example, results may be checked to determine if the pupil and a sufficient number of glints to perform gaze estimation are successfully tracked or detected in the current frames. At 650, if the results cannot be trusted, then the tracking state is set to NO at element 660, and the method returns to element 610 to process next images of the user's eyes. At 650, if the results are trusted, then the method proceeds to element 670. At 670, the tracking state is set to YES (if not already YES), and the pupil and glint information is passed to element 680 to estimate the user's point of gaze.
In some embodiments, the captured portions of real-world environment 602 are used to provide a XR experience to the user, for example, a mixed reality environment in which one or more virtual objects are superimposed over representations of real-world environment 602.
Thus, the description herein describes some embodiments of three-dimensional environments (e.g., XR environments) that include representations of real-world objects and representations of virtual objects. For example, a three-dimensional environment optionally includes a representation of a table that exists in the physical environment, which is captured and displayed in the three-dimensional environment (e.g., actively via cameras and displays of a computer system, or passively via a transparent or translucent display of the computer system). As described previously, the three-dimensional environment is optionally a mixed reality system in which the three-dimensional environment is based on the physical environment that is captured by one or more sensors of the computer system and displayed via a display generation component. As a mixed reality system, the computer system is optionally able to selectively display portions and/or objects of the physical environment such that the respective portions and/or objects of the physical environment appear as if they exist in the three-dimensional environment displayed by the computer system. Similarly, the computer system is optionally able to display virtual objects in the three-dimensional environment to appear as if the virtual objects exist in the real world (e.g., physical environment) by placing the virtual objects at respective locations in the three-dimensional environment that have corresponding locations in the real world. For example, the computer system optionally displays a vase such that it appears as if a real vase is placed on top of a table in the physical environment. In some embodiments, a respective location in the three-dimensional environment has a corresponding location in the physical environment. Thus, when the computer system is described as displaying a virtual object at a respective location with respect to a physical object (e.g., such as a location at or near the hand of the user, or at or near a physical table), the computer system displays the virtual object at a particular location in the three-dimensional environment such that it appears as if the virtual object is at or near the physical object in the physical world (e.g., the virtual object is displayed at a location in the three-dimensional environment that corresponds to a location in the physical environment at which the virtual object would be displayed if it were a real object at that particular location).
In some embodiments, real-world objects that exist in the physical environment that are displayed in the three-dimensional environment (e.g., and/or visible via the display generation component) can interact with virtual objects that exist only in the three-dimensional environment. For example, a three-dimensional environment can include a table and a vase placed on top of the table, with the table being a view of (or a representation of) a physical table in the physical environment, and the vase being a virtual object.
In a three-dimensional environment (e.g., a real environment, a virtual environment, or an environment that includes a mix of real and virtual objects), objects are sometimes referred to as having a depth or simulated depth, or objects are referred to as being visible, displayed, or placed at different depths. In this context, depth refers to a dimension other than height or width. In some embodiments, depth is defined relative to a fixed set of coordinates (e.g., where a room or an object has a height, depth, and width defined relative to the fixed set of coordinates). In some embodiments, depth is defined relative to a location or viewpoint of a user, in which case, the depth dimension varies based on the location of the user and/or the location and angle of the viewpoint of the user. In some embodiments where depth is defined relative to a location of a user that is positioned relative to a surface of an environment (e.g., a floor of an environment, or a surface of the ground), objects that are further away from the user along a line that extends parallel to the surface are considered to have a greater depth in the environment, and/or the depth of an object is measured along an axis that extends outward from a location of the user and is parallel to the surface of the environment (e.g., depth is defined in a cylindrical or substantially cylindrical coordinate system with the position of the user at the center of the cylinder that extends from a head of the user toward feet of the user). In some embodiments where depth is defined relative to viewpoint of a user (e.g., a direction relative to a point in space that determines which portion of an environment that is visible via a head mounted device or other display), objects that are further away from the viewpoint of the user along a line that extends parallel to the direction of the viewpoint of the user are considered to have a greater depth in the environment, and/or the depth of an object is measured along an axis that extends outward from a line that extends from the viewpoint of the user and is parallel to the direction of the viewpoint of the user (e.g., depth is defined in a spherical or substantially spherical coordinate system with the origin of the viewpoint at the center of the sphere that extends outwardly from a head of the user). In some embodiments, depth is defined relative to a user interface container (e.g., a window or application in which application and/or system content is displayed) where the user interface container has a height and/or width, and depth is a dimension that is orthogonal to the height and/or width of the user interface container. In some embodiments, in circumstances where depth is defined relative to a user interface container, the height and or width of the container are typically orthogonal or substantially orthogonal to a line that extends from a location based on the user (e.g., a viewpoint of the user or a location of the user) to the user interface container (e.g., the center of the user interface container, or another characteristic point of the user interface container) when the container is placed in the three-dimensional environment or is initially displayed (e.g., so that the depth dimension for the container extends outward away from the user or the viewpoint of the user). In some embodiments, in situations where depth is defined relative to a user interface container, depth of an object relative to the user interface container refers to a position of the object along the depth dimension for the user interface container. In some embodiments, multiple different containers can have different depth dimensions (e.g., different depth dimensions that extend away from the user or the viewpoint of the user in different directions and/or from different starting points). In some embodiments, when depth is defined relative to a user interface container, the direction of the depth dimension remains constant for the user interface container as the location of the user interface container, the user and/or the viewpoint of the user changes (e.g., or when multiple different viewers are viewing the same container in the three-dimensional environment such as during an in-person collaboration session and/or when multiple participants are in a real-time communication session with shared virtual content including the container). In some embodiments, for curved containers (e.g., including a container with a curved surface or curved content region), the depth dimension optionally extends into a surface of the curved container. In some situations, z-separation (e.g., separation of two objects in a depth dimension), z-height (e.g., distance of one object from another in a depth dimension), z-position (e.g., position of one object in a depth dimension), z-depth (e.g., position of one object in a depth dimension), or simulated z dimension (e.g., depth used as a dimension of an object, dimension of an environment, a direction in space, and/or a direction in simulated space) are used to refer to the concept of depth as described above.
In some embodiments, a user is optionally able to interact with virtual objects in the three-dimensional environment using one or more hands as if the virtual objects were real objects in the physical environment. For example, as described above, one or more sensors of the computer system optionally capture one or more of the hands of the user and display representations of the hands of the user in the three-dimensional environment (e.g., in a manner similar to displaying a real-world object in three-dimensional environment described above), or in some embodiments, the hands of the user are visible via the display generation component via the ability to see the physical environment through the user interface due to the transparency/translucency of a portion of the display generation component that is displaying the user interface or due to projection of the user interface onto a transparent/translucent surface or projection of the user interface onto the user's eye or into a field of view of the user's eye. Thus, in some embodiments, the hands of the user are displayed at a respective location in the three-dimensional environment and are treated as if they were objects in the three-dimensional environment that are able to interact with the virtual objects in the three-dimensional environment as if they were physical objects in the physical environment. In some embodiments, the computer system is able to update display of the representations of the user's hands in the three-dimensional environment in conjunction with the movement of the user's hands in the physical environment.
In some of the embodiments described below, the computer system is optionally able to determine the “effective” distance between physical objects in the physical world and virtual objects in the three-dimensional environment, for example, for the purpose of determining whether a physical object is directly interacting with a virtual object (e.g., whether a hand is touching, grabbing, holding, etc. a virtual object or within a threshold distance of a virtual object). For example, a hand directly interacting with a virtual object optionally includes one or more of a finger of a hand pressing a virtual button, a hand of a user grabbing a virtual vase, two fingers of a hand of the user coming together and pinching/holding a user interface of an application, and any of the other types of interactions described here. For example, the computer system optionally determines the distance between the hands of the user and virtual objects when determining whether the user is interacting with virtual objects and/or how the user is interacting with virtual objects. In some embodiments, the computer system determines the distance between the hands of the user and a virtual object by determining the distance between the location of the hands in the three-dimensional environment and the location of the virtual object of interest in the three-dimensional environment. For example, the one or more hands of the user are located at a particular position in the physical world, which the computer system optionally captures and displays at a particular corresponding position in the three-dimensional environment (e.g., the position in the three-dimensional environment at which the hands would be displayed if the hands were virtual, rather than physical, hands). The position of the hands in the three-dimensional environment is optionally compared with the position of the virtual object of interest in the three-dimensional environment to determine the distance between the one or more hands of the user and the virtual object. In some embodiments, the computer system optionally determines a distance between a physical object and a virtual object by comparing positions in the physical world (e.g., as opposed to comparing positions in the three-dimensional environment). For example, when determining the distance between one or more hands of the user and a virtual object, the computer system optionally determines the corresponding location in the physical world of the virtual object (e.g., the position at which the virtual object would be located in the physical world if it were a physical object rather than a virtual object), and then determines the distance between the corresponding physical position and the one of more hands of the user. In some embodiments, the same techniques are optionally used to determine the distance between any physical object and any virtual object. Thus, as described herein, when determining whether a physical object is in contact with a virtual object or whether a physical object is within a threshold distance of a virtual object, the computer system optionally performs any of the techniques described above to map the location of the physical object to the three-dimensional environment and/or map the location of the virtual object to the physical environment.
In some embodiments, the same or similar technique is used to determine where and what the gaze of the user is directed to and/or where and at what a physical stylus held by a user is pointed. For example, if the gaze of the user is directed to a particular position in the physical environment, the computer system optionally determines the corresponding position in the three-dimensional environment (e.g., the virtual position of the gaze), and if a virtual object is located at that corresponding virtual position, the computer system optionally determines that the gaze of the user is directed to that virtual object. Similarly, the computer system is optionally able to determine, based on the orientation of a physical stylus, to where in the physical environment the stylus is pointing. In some embodiments, based on this determination, the computer system determines the corresponding virtual position in the three-dimensional environment that corresponds to the location in the physical environment to which the stylus is pointing, and optionally determines that the stylus is pointing at the corresponding virtual position in the three-dimensional environment.
Similarly, the embodiments described herein may refer to the location of the user (e.g., the user of the computer system) and/or the location of the computer system in the three-dimensional environment. In some embodiments, the user of the computer system is holding, wearing, or otherwise located at or near the computer system. Thus, in some embodiments, the location of the computer system is used as a proxy for the location of the user. In some embodiments, the location of the computer system and/or user in the physical environment corresponds to a respective location in the three-dimensional environment. For example, the location of the computer system would be the location in the physical environment (and its corresponding location in the three-dimensional environment) from which, if a user were to stand at that location facing a respective portion of the physical environment that is visible via the display generation component, the user would see the objects in the physical environment in the same positions, orientations, and/or sizes as they are displayed by or visible via the display generation component of the computer system in the three-dimensional environment (e.g., in absolute terms and/or relative to each other). Similarly, if the virtual objects displayed in the three-dimensional environment were physical objects in the physical environment (e.g., placed at the same locations in the physical environment as they are in the three-dimensional environment, and having the same sizes and orientations in the physical environment as in the three-dimensional environment), the location of the computer system and/or user is the position from which the user would see the virtual objects in the physical environment in the same positions, orientations, and/or sizes as they are displayed by the display generation component of the computer system in the three-dimensional environment (e.g., in absolute terms and/or relative to each other and the real-world objects).
In the present disclosure, various input methods are described with respect to interactions with a computer system. When an example is provided using one input device or input method and another example is provided using another input device or input method, it is to be understood that each example may be compatible with and optionally utilizes the input device or input method described with respect to another example. Similarly, various output methods are described with respect to interactions with a computer system. When an example is provided using one output device or output method and another example is provided using another output device or output method, it is to be understood that each example may be compatible with and optionally utilizes the output device or output method described with respect to another example. Similarly, various methods are described with respect to interactions with a virtual environment or a mixed reality environment through a computer system. When an example is provided using interactions with a virtual environment and another example is provided using mixed reality environment, it is to be understood that each example may be compatible with and optionally utilizes the methods described with respect to another example. As such, the present disclosure discloses embodiments that are combinations of the features of multiple examples, without exhaustively listing all features of an embodiment in the description of each example embodiment.
Attention is now directed towards embodiments of user interfaces (“UI”) and associated processes that may be implemented on a computer system, such as a portable multifunction device or a head-mounted device, in communication with a display generation component and at least one camera.
At
As illustrated in
Media capture user interface 710 further includes border 714, which visually indicates the edges of camera viewfinder 712, and darkened area 716, which overlays a second portion of the representation of the field-of-view of first camera 704A that falls outside of camera viewfinder 712. In some embodiments, border 714 visually indicates the edges of camera viewfinder 712 by modifying (e.g., blurring, darkening, obscuring, and/or otherwise visually indicating edges of the camera viewfinder) the appearance of a third portion of the representation of the field-of-view of first camera 704A. For example, the appearance of the third portion of the representation of the field-of-view of first camera 704A can be modified create soft gradient between camera viewfinder 712 and darkened area 716, vignetting the first portion of the representation of the field-of-view of first camera 704A.
Media capture user interface 710 further includes a shutter affordance 718 displayed in a central region of camera viewfinder 712. The two concentric rings included in shutter affordance 718 are at least partially transparent or translucent, such that the portion of the representation of the field-of-view of first camera 704A underlying the concentric rings remains at least partially visible to the user. Media capture user interface 710 further includes options affordance 720, first status indicator 722A, and second status indicator 722B (e.g., as described in further detail with respect to
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in FIG. 7I1, while displaying shutter affordance 718 with increased visual prominence, computer system 700 detects a second potential media capture input, such as button press input 736A of hardware button 706, air gesture input 736B (e.g., an air gesture performed with the thumb and forefinger of hand 726, such as a pinch air gesture), and/or tap input 736C (e.g., on a touch sensitive surface of display 708). At the time the second potential media capture input is detected, gaze 732 is directed at shutter affordance 718 (e.g., gaze 732 is within a central region of camera viewfinder 712). Accordingly, in response to the second potential media capture input, computer system 700 initiates media capture. In some embodiments, computer system 700 initiates the capture of both photo media and video media, and later determines (e.g., as described below) which mode of capture was requested by the user. In some embodiments, the media capture is a spatial media capture, performed using both first camera 704A and second camera 704B to capture virtual reality media (e.g., photos and/or videos) that can be presented with the appearance/illusion of depth. Computer system 700 further decreases the translucency and size of the concentric rings of shutter affordance 718 in response to the second potential media capture input, reflecting the state of the detected second potential media capture input (e.g., further increasing the prominence of the concentric rings and squeezing the concentric rings closer together as hardware button 706 is further depressed and/or air gesture input 736B is further pinched together).
In some embodiments, the techniques and user interface(s) described in FIG. 7I1 are provided by one or more of the devices described in
As illustrated in FIG. 7I2, while displaying shutter affordance X718 with increased visual prominence, HMD X700 detects a second potential media capture input, such as button press input X736A of hardware button X706, air gesture input X736B (e.g., an air gesture performed with the thumb and forefinger of X750A, such as a pinch air gesture, discussed in further detail below), and/or tap input X736C (e.g., on a touch sensitive surface of display 708). At the time the second potential media capture input is detected, gaze X732 is directed at shutter affordance X718 (e.g., gaze X732 is within a central region of camera viewfinder X712). Accordingly, in response to the second potential media capture input, HMD X700 initiates media capture. In some embodiments, HMD X700 initiates the capture of both photo media and video media, and later determines (e.g., as described below) which mode of capture was requested by the user. In some embodiments, the media capture is a spatial media capture (e.g., performed using cameras such as first camera 704A and second camera 704B) to capture virtual reality media (e.g., photos and/or videos) that can be presented with the appearance/illusion of depth. HMD X700 further decreases the translucency and size of the concentric rings of shutter affordance X718 in response to the second potential media capture input, reflecting the state of the detected second potential media capture input (e.g., further increasing the prominence of the concentric rings and squeezing the concentric rings closer together as hardware button X706 is further depressed and/or air gesture input X736B is further pinched together).
In some embodiments, HMD X700 detects the second potential media capture input based on air gesture input X736B performed by a user of HMD X700. In some embodiments, HMD X700 detects hand X750A (e.g., the left or right hand of a user or both hands of the user) of the user of HMD X700 and determines whether motion of hand X750A performs a predetermined air gesture corresponding to the second potential media capture input. In some embodiments, the predetermined air gesture includes a pinch gesture. In some embodiments, the pinch gesture includes detecting movement of finger X750C and thumb X750D toward one another.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
As illustrated in
Computer system 700 additionally initiates a photo media capture animation, as illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
FIGS. 7AA1-7AA9 illustrate a version of media capture user interface 710 including mode control affordance 746 and video status affordance 750, in accordance with some embodiments. As illustrated in FIG. 7AA1, media capture user interface 710 includes mode control affordance 746, displayed in a lower region of camera viewfinder 712. Mode control affordance 746 includes photo mode affordance 746A and video mode affordance 746B. Computer system 700 initially displays mode control affordance 746 opaquely. Computer system 700 highlights photo mode affordance 746A with a backing platter to indicate that a photo media capture mode is currently selected (e.g., that computer system 700 would currently initiate photo media capture in response to an appropriate media capture input). In some embodiments, computer system 700 displays the text of photo mode affordance 746A with a different color than the text of video mode affordance 746B to indicate the currently-selected media capture mode.
As illustrated in FIG. 7AA2, after a threshold amount of time (e.g., a few seconds) without detecting gaze 732 directed at mode control affordance 746, computer system 700 increases the translucency (e.g., reduces the opacity) of mode control affordance 746 (e.g., as described above with respect to decreasing the visual prominence of captured media icon 738). As illustrated in FIG. 7AA3, in response to detecting gaze 732 directed at mode control affordance 746, computer system 700 decreases the translucency (e.g., increases the opacity) of mode control affordance 746.
As illustrated in FIG. 7AA3, while displaying mode control affordance 746 with photo mode affordance 746A highlighted, computer system 700 detects a fifth potential media capture input, such as button press input 747A of hardware button 706, air gesture input 747B (e.g., an air gesture performed with the thumb and forefinger of hand 726, such as a pinch air gesture), and/or tap input 747C (e.g., on a touch sensitive surface of display 708). At the time the fifth potential media capture input is detected, gaze 732 is directed at mode control affordance 746. Accordingly, in response to the fifth potential media capture input, computer system 700 switches the currently-selected media capture mode from the photo media capture mode to a video media capture mode.
As illustrated in FIG. 7AA4, computer system 700 highlights video mode affordance 746B with a backing platter to indicate that the video media capture mode is currently selected. In some embodiments, computer system 700 changes the color of text of photo mode affordance 746A and of the text of video mode affordance 746B to indicate the currently-selected media capture mode. Additionally, computer system alters the appearance of shutter affordance 718 to indicate that the video media capture mode is currently selected, for example, displaying shutter affordance 718 with a translucent or opaque color fill (e.g., red, or another color).
As illustrated in FIG. 7AA5, while displaying mode control affordance 746 with video mode affordance 746B highlighted and shutter affordance 718 with the color fill, computer system 700 detects a sixth potential media capture input, such as button press input 748A of hardware button 706, air gesture input 748B, and/or tap input 748C while gaze 732 is directed at shutter affordance 718 and accordingly initiates media capture (e.g., as described with respect to
As illustrated in FIG. 7AA6, while capturing video media, computer system 700 ceases display of mode control affordance 746 and instead displays video status affordance 750 in the lower region of camera viewfinder 712. Video status affordance 750 indicates that video media is currently being captured and includes the currently elapsed time of the video capture. Upon beginning the video capture, computer system 700 initially displays video status affordance 750 opaquely. As illustrated in FIG. 7AA7, after a threshold amount of time (e.g., a few seconds) without detecting gaze 732 directed at video status affordance 750, computer system 700 increases the translucency (e.g., reduces the opacity) of video status affordance 750 (e.g., as described above with respect to changing the visual prominence of captured media icon 738 and/or mode control affordance 746). As illustrated in FIG. 7AA8, in response to detecting gaze 732 directed at video status affordance 750, computer system 700 decreases the translucency (e.g., increases the opacity) of video status affordance 750.
As illustrated in FIG. 7AA8, while displaying video status affordance 750 with increased visual prominence, computer system 700 detects a seventh potential media capture input, such as button press input 752A of hardware button 706, air gesture input 752B, and/or tap input 752C. At the time the seventh potential media capture input is detected, gaze 732 is directed at video status affordance 750. Accordingly, in response to the seventh potential media capture input, computer system 700 ends the video media capture. As illustrated in FIG. 7AA9, upon ending the video media capture, computer system ceases display of video status affordance 750 and again displays mode control affordance 746 in the lower region of camera viewfinder 712.
As illustrated in
In some embodiments where computer system 700 is a head-mounted device, the techniques illustrated in
Additional descriptions regarding
The computer system (e.g., 101, 1-100, 1-200, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, 700, X700, and/or 702), while displaying (802), via the display generation component (e.g., 1-102, 1-120a, 1-120b, 11.1.1-104a, 11.1.1-104b, 1-108, 1-122a, 1-122b, 1-202, 1-306, 1-308, 1-320, 1-322a, 1-322b, 1-406, 1-402, 1-421, 3-108, 6-334, 11.3.2-100, 11.3.2-104, 11.3.2-200, 11.3.2-204, 708, and/or X702), a first user interface (e.g., 710 and/or X710) (e.g., a camera/capture UI including a representation of at least a portion of a field-of-view of the first camera) that includes a camera viewfinder (e.g., 712 and/or X712) (e.g., a viewfinder/camera preview object, such as an object framing/encompassing a region for media capture) (in some embodiments, overlaying at least a portion of an environment via a transparent display, pass-through camera data and/or virtual content (e.g., a physical environment, a virtual environment, and/or a mixed-reality environment)), detects (804) a first input (e.g., 730A, 730B, 730C, 736A, 736B, 736C, X736A, X736B, X736C, 740A, 740B, and/or 740C) (an activation or selection input that is used to trigger operations at the device when the input is detected while the user's attention is directed to a selectable user interface object; in some embodiments, a press of a hardware button; in some embodiments, a gesture input, an air gesture (e.g., an air pinch); in some embodiments, a speech input; in some embodiments, the activation input does not include location-based inputs, such as touch inputs on a touch-sensitive display or mouse clicks).
The computer system, in response to detecting the first input (806) and in accordance with a determination that a gaze of a user of the computer system (e.g., 732 and/or X732) (e.g., detected using a camera (e.g., a rear camera) and/or another gaze detection sensor; in some embodiments, gaze detection is always on (e.g., not just detected in response to the activation input)) is directed to (e.g., a determination that the user is looking at or near) a respective region of the camera viewfinder (e.g., a capture activation region of the UI (e.g., a predetermined and/or predefined region), such as the center region of the viewfinder or UI; in some embodiments, some or part of the respective region is indicated by a media capture affordance, such as a ring at the center of the viewfinder) when the first input is detected (e.g., if the user is both (e.g., simultaneously) requesting capture and looking at the appropriate portion of the UI), initiates (808) capture of first media content (in some embodiments, taking a photo; in some embodiments, initiating video capture) using the first camera (e.g., as illustrated in
The computer system, in response to detecting the first input and in accordance with a determination that the gaze of the user of the computer system input is not directed to the respective region of the camera viewfinder when the first input is detected, forgoes (810) initiating capture of the first media content (e.g., as illustrated in
In some embodiments, the respective region is a center region within the camera viewfinder (e.g., 712 and/or X712) (e.g., a region that includes a center of the camera viewfinder) (in some embodiments, within or near the capture affordance at the center of the viewfinder; in some embodiments, the respective region is also centered in the first user interface). Only initiating media capture in response to a respective user input when the user is looking at a respective region of the camera viewfinder provides the user with improved control of media capture. For example, by forgoing media capture when the user is not looking at the central region of the viewfinder, media is not captured in response to user inputs that are not intended to trigger media capture (e.g., “false positive” captures are reduced) or when the user isn't paying attention to the central region.
In some embodiments, the first media content includes photo media content (e.g., a still photo and/or media capture of a limited duration) (e.g., as illustrated in FIGS. 7I1-7N)
In some embodiments, the first media content includes video content (e.g., as illustrated in
In some embodiments, the first user interface (e.g., 710) includes a first media capture selectable interface object (e.g., 718 and/or X718) (e.g., an affordance (e.g., two concentric rings)), and the first media capture selectable interface object is displayed in the respective region (e.g., at the center of the camera viewfinder) (in some embodiments, the respective region is substantially the same region at which the media capture selectable interface object is displayed). Displaying a media capture affordance provides the user with improved visual feedback on a state of the computer system (e.g., whether the system is ready to capture media, whether the system will respond to the respective input) and visually indicates at least a portion the respective region to the user for the gaze activation.
In some embodiments, at least a first element of the first media capture selectable interface object (e.g., 718 and/or X718) (e.g., the concentric rings) is at least partially translucent (e.g., at least a portion of the representation of a field-of-view of the first camera (e.g., of one or more cameras) is partially visible through the media capture selectable interface object) (e.g., the media capture affordance is semi-transparent, such that the environment (in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment) is visible through, but at least partially obscured by, the media capture affordance). Translucently overlaying the media capture affordance over the field-of-view of the camera provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, while displaying the camera viewfinder, the computer system detects the gaze of the user of the computer system (e.g., 732 and/or X732) is directed to the respective region of the camera viewfinder; and in response to detecting the gaze of the user of the computer system is directed to the respective region of the camera viewfinder (e.g., if a respective input would initiate media capture), the computer system makes a first change to one or more visual features (e.g., altering and/or changing the appearance of the media capture affordance; e.g., the translucency/brightness of the rings, the size of the rings, and/or the color/fill of the rings) of the first media capture selectable interface object (e.g., as illustrated in
In some embodiments, making the first change to the one or more visual features of the first media capture selectable interface object (e.g., 718 and/or X718) includes increasing a brightness of at least a portion of the first media capture selectable interface object (e.g., increasing the brightness/opacity of the rings). Changing the appearance of the media capture affordance when the user is looking at the respective region provides improved visual feedback on a state of the computer system (e.g., with respect to readiness to capture on input) to the user, indicating that the user is able to capture media using a respective input. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are not missed.
In some embodiments, making the first change the one or more visual features of the first media capture selectable interface object (e.g., 718 and/or X718) includes decreasing a size of at least a portion of the first media capture selectable interface object (e.g., shrinking the rings and/or squeezing the rings closer together). Changing the appearance of the media capture affordance when the user is looking at the respective region provides improved visual feedback on a state of the computer system (e.g., with respect to readiness to capture on input) to the user, indicating that the user is able to capture media using a respective input. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are not missed.
In some embodiments, in response to detecting the first input (e.g., 730A, 730B, 730C, 736A, 736B, 736C, X736A, X736B, X736C, 740A, 740B, and/or 740C) (in some embodiments, in response to detecting the respective user input and in accordance with a determination the gaze of the user of the computer system is directed to the respective region of the camera viewfinder), the computer system makes a second change to the one or more visual features of the first media capture selectable interface object based on the first input (in some embodiments, the first input has a first input characteristic (e.g., input speed, input length, and/or input force) and changing one or more visual features of the media capture selectable interface object varies based on a value of the input characteristic) (e.g., as illustrated in FIGS. 7I1-7J). Changing the appearance of the media capture affordance based on the respective input provides improved visual feedback on a state of the computer system to the user (e.g., with respect to the user's input being detected by the system), indicating that the system is responding to the respective input. Providing feedback to the user about the state of the respective input makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing unnecessary additional user inputs) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first input includes a first stage (e.g., an initial stage, such as an at least partially closed pinch or an at least partially depressed hardware button) and a second stage (e.g., a later stage, such as the release of a pinch or a hardware button). In some embodiments, making the second change the one or more visual features of the first media capture selectable interface object based on the first input includes, in response to detecting the first stage of the first input (e.g., in response to detecting the initiation of the pinch or button press), making a third change to the one or more visual features of the first media capture selectable interface object (e.g., decreasing the size of the rings and squeezing them together while the pinch is closing or hardware button is being depressed) (e.g., as illustrated in FIG. 7I1-7I2), and in response to detecting the second stage of the first input (e.g., in response to detecting the release of the pinch or button press), making a fourth change to the one or more visual features of the first media capture selectable interface object (e.g., increasing the size of the rings to a default size and temporarily increasing the opacity/brightness of the rings when the pinch or hardware button is released) (e.g., as illustrated in
In some embodiments, after making the third change to the one or more visual features of the first media capture selectable interface object, in accordance with a determination that a duration of the first stage of the first input exceeds a first duration threshold (e.g., in response to detecting that the pinch or button press is being held by the user (e.g., as opposed to being immediately/quickly released)), the computer system makes a fifth change to the one or more visual features of the first media capture selectable interface object (e.g., as illustrated in
In some embodiments, in response to detecting the first input, in accordance with a determination that the gaze of the user of the computer system is directed to the respective region of the camera viewfinder when the first input is detected, the computer system displays an animation (e.g., as illustrated in
In some embodiments, the first input includes an air gesture (e.g., 730B, 736B, X736B, and/or 740B) (e.g., an air pinch (e.g., performed with the thumb and forefinger), an air tap, and/or an air double tap) (in some embodiments, the air gesture is detected using at least the first camera (e.g., of one or more cameras)). Initiating media capture in response to an air gesture while the gaze of the user is directed to a respective region, provides the user with additional control options without cluttering the user interface and/or without having to interact with a specific hardware control element (e.g., a button or touch-sensitive surface of the computer system that may not be visible to the user (e.g., when operating an HMD)). Doing so also reduces the risk that transient media capture opportunities are missed due to a failure to locate and/or activate a necessary input element.
In some embodiments, the first input includes an activation of a hardware button in communication with the computer system (e.g., 730A, 736A, X736A, and/or 740A) (e.g., pressing a button on a device or headset; in some embodiments, the hardware button is a multifunction button; in some embodiments, the button includes capacitive sensors to detect the proximity and/or touch of a finger). Initiating media capture in response to a user activating a hardware button while the gaze of the user is directed to a respective region, provides the user with additional control options without cluttering the user interface. Doing so also makes the user-system interface more efficient (e.g., a single hardware button may be used for multiple different functions, where the media capture function is activated based on gaze) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, while displaying the first user interface and prior to detecting the first input, the computer system detects a second input (e.g., an input indicating that the user may initiate media capture; in some embodiments, when the user raises their hand into view of the camera; in some embodiments, when the user's finger is near or on the hardware button) (in some embodiments, a user intent, such as an intent to prepare to capture media or an intent to place the system in a ready-to-capture state, is determined based on the second respective input). In some embodiments, in response to detecting the second input (in some embodiments, in response to determining the user intent), the computer system makes a first change to one or more visual features of the first user interface (e.g., transitioning the UI to a “ready-to-capture” state, e.g., by darkening the vignetting around the camera preview; increasing/decreasing the prominence (e.g., opacity/brightness/size) of the capture affordance; and/or increasing or decreasing the prominence of other UI elements) (e.g., as illustrated in
In some embodiments, detecting the second input includes detecting a position of a user's hand (e.g., 724, 726, and/or 750A) (e.g., a position in the field-of-view of the camera; a position near or on the hardware button) (in some embodiments, detecting the second respective input includes determining that the position of the user's hand indicates an intent to capture media). Changing the appearance of the UI in response to detecting the position of user's hand provides intuitive and efficient control of media capture, for example, entering a ready-to-capture state in response to detecting that the user's hand has moved into a position to provide further inputs (e.g., the respective input) without requiring the user to provide additional inputs. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed.
In some embodiments, making the first change to the one or more visual features of the first user interface includes altering an appearance (e.g., increasing the size, increasing the opacity, and/or darkening) of an area of the first user interface outside the camera viewfinder (e.g., 716 and/or X716) (e.g., vignetting the camera viewfinder or otherwise modifying the visual appearance of an edge of the camera viewfinder; in some embodiments, the area of the user interface overlays a portion of a field-of-view of the first camera (e.g., of the one or more cameras) that would not currently be included in a media capture). Altering the appearance of the UI outside of the camera viewfinder provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events (e.g., by framing an approximate capture region) and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, making the first change to the one or more visual features of the first user interface includes altering an appearance of a second media capture user interface object (e.g., 718 and/or X718) (e.g., increasing or decreasing the opacity of the rings; darkening or brightening the rings; and/or changing the size of the rings to increase or decrease the prominence of the media capture affordance (e.g., relative to one or more other portions of the first user interface)) (in some embodiments, the second media capture user interface object is the same as the first media capture user interface object). Altering the appearance of a media capture affordance provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, making the first change to the one or more visual features of the first user interface includes altering an appearance (e.g., increasing or decreasing the prominence of by altering the opacity, and/or brightness/contrast, size) of one or more user interface elements (e.g., 722A, 722B, X722A, X722B, 720, X720, and/or 738) other than a third media capture selectable interface object (e.g., 718 and/or X718) (e.g., photo well, settings affordances, text, and/or status indicators) (in some embodiments, the third media capture selectable interface object is the same as the second and first media capture selectable interface objects). Changing the appearance of the first user interface in response to determining a user intent provides improved visual feedback on a state of the computer system (e.g., with respect to preparing for potential media capture) to the user, indicating that the user may be able to capture media using a respective input. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed.
In some embodiments, while displaying the first user interface, the computer system detects a third input (in some embodiments, a cessation of the second respective input, such as when the user lowers their hand out of view of the camera or when the user's finger is not near or on the hardware button; in some embodiments, an input that conveys user intent, such as an intent to not capture media/to remove the system from a ready-to-capture state, is determined based on the third respective input; in some embodiments, lowering the hand and/or removing the finger from the hardware button is not detected and/or determined to correspond to an intent to return to the idle state if a video capture is ongoing, so, e.g., the vignetting does not change in appearance during the course of video capture). In some embodiments, in response to detecting the third input, the computer system makes a change to the one or more visual features of the first user interface based on the third input (e.g., as illustrated in
In some embodiments, the camera viewfinder (e.g., 712 and/or X712) overlays (e.g., includes or frames; in some embodiments, at least semi-transparently) a first portion of a representation of a field-of-view of the first camera (e.g., of one or more cameras) (e.g., approximately the portion of the field-of-view of the first camera that would currently be included in a media capture; in some embodiments, the field-of-view of the first camera is displayed using the display-generation component (e.g., displaying the camera data stream, such as pass-through video); in some embodiments, the UI is overlaid over the field-of-view of the first camera using a transparent display). In some embodiments, the first user interface includes a first visual indication (e.g., 714, X714, 716, and/or X716) (e.g., a transition or gradient to darkening and/or blurring) along a first edge of the camera viewfinder (e.g., the border/frame of the capture region), wherein the first visual indication modifies (e.g., vignettes (e.g., blurs or darkens)) a visual appearance of a second portion of the representation of the field-of-view of the first camera that corresponds to a location of the first visual indication. Altering the appearance of a media capture affordance provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, initiating the capture of the first media content using the first camera (e.g., of one or more cameras) includes displaying, via the display generation component, a first media capture animation (e.g., as illustrated in
In some embodiments, displaying the first media capture animation includes darkening (e.g., reducing the brightness of and/or overlaying with at least partially-translucent shading) a second area (e.g., 716 and/or X716) of the first user interface (e.g., an area outside of the camera viewfinder (e.g., an area at or near an outer edge of the first user interface)) and expanding the darkened first area towards a center of the camera viewfinder (e.g., as illustrated in
In some embodiments, displaying the first media capture animation includes darkening (e.g., reducing the brightness of and/or overlaying with at least partially-translucent shading) a third area (e.g., 716 and/or X716) of the first user interface and contracting the darkened third area away from a center of the camera viewfinder (e.g., as illustrated in
In some embodiments, prior to displaying the first media capture animation, the first user interface includes a second visual indication (e.g., a transition or gradient to darkening and/or blurring; in some embodiments, the second visual indication is the same as the first visual indication (e.g., the vignetting)) along a second edge of the camera viewfinder (e.g., the border/frame of the capture region; in some embodiments, the second edge of the camera viewfinder is the same as the first edge) that darkens a first portion of the user interface (e.g., 716 and/or X716) to a first level of darkening (e.g., a low- or intermediate-level of vignetting is displayed when the camera is in an idle or ready-to-capture state). In some embodiments, displaying the first media capture animation includes darkening at least a second portion of the user interface (e.g., 716 and/or X716) to a second level of darkening, wherein the second level of darkening appears darker than the first level of darkening (e.g., the shutter animation vignetting is the highest level (e.g., darkest and/or most opaque) of vignetting). Displaying a capture animation that darkens the user interface further than the user interface is darkened in other states (e.g., idle or ready-to-capture) provides improved visual feedback on the state of the computer system to the user, for example, indicating that the capture has been successfully initiated and/or is in progress. Providing the user with improved visual feedback on the state of media capture makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing unnecessary additional user inputs) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, in response to initiating the capture of the first media content using the first camera (e.g., of one or more cameras) and in accordance with a determination that the first input corresponds to a request to capture a first type of media (e.g., 736A, 736B, 736C, X736A, X736B, and/or X736C) (e.g., an input for still capture; in some embodiments, a quick/immediate release, such as if the input is released before the red dot appears in the first place; in some embodiments, release before a threshold is reached, such as before the red dot has faded in more than a threshold amount), the computer system displays a media capture animation of a first type (e.g., as illustrated in
In some embodiments, in response to detecting an initial portion of the first input (in some embodiments, detecting that the user input has been held/maintained for longer than a certain duration, but not yet long enough to correspond to a video input), the computer system displays a first visual indicator for a respective period of time (e.g., as illustrated in
In some embodiments, after capturing the first media content, the computer system displays, via the display generation component, a representation (e.g., a thumbnail) of the first media content (e.g., 738) (in some embodiments, the photo well is located in the camera viewfinder region, e.g., in the lower left corner, lower right corner, or other corner; in some embodiments, the representation of the first media content is displayed until subsequent media content is captured; in some embodiments, the photo well is only displayed after the first media capture in a given media capture session (e.g., it is not displayed when a new media capture session is initiated)). Displaying a representation of the first media content provides improved visual feedback on a state of the computer system, for example, confirming to the user that media was captured and allowing the user to assess aspects of the captured media (e.g., exposure and/or framing). Displaying the representation of the first media content after capture (e.g., automatically) reduces the number of inputs needed to perform an operation (e.g., for the user to view the recently captured media).
In some embodiments, displaying the representation of the first media content (e.g., 738) includes (in some embodiments, initially includes, e.g., immediately after capture is completed) displaying an animation of a change of appearance (e.g., resolving from overexposed to a target exposure level, decreasing in size, and/or decreasing in opacity) of the representation of the first media content (e.g., as illustrated in
In some embodiments, at least a portion of the representation of the first media content is at least partially transparent (e.g., as illustrated in
In some embodiments, while displaying the representation of the first media content (e.g., 738), the computer system detects a user interaction with the representation of the first media content (e.g., a user input, a gaze, and/or an action taken using the user interface). In some embodiments, in response to detecting the user interaction with the representation of the first media content, the computer system changes (e.g., increasing or decreasing) a degree of transparency of the representation of the first media content (in some embodiments, decreasing the degree of transparency (e.g., increasing opacity) in response to the user looking at the photo well) (in some embodiments, the degree of transparency is also changed (e.g., increased) in response to detecting a user interaction with something other than the representation, such as in response to the user starting a media capture). Changing the transparency of the photo well in response to interactions provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, detecting the user interaction with the representation of the first media content (e.g., 738) includes determining that a gaze of the user of the computer system is directed to the representation of the first media content (e.g., the user interacts with the photo well by looking at it; in some embodiments, the degree of transparency of the representation of the first media content is decreased, making the photo well more opaque while the user is looking at it; in some embodiments, the photo well fades (increases transparency) after a threshold period of time without gaze). Decreasing the transparency of the photo well in response to the user looking at the photo well provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, detecting the user interaction with the representation of the first media content (e.g., 738) includes detecting an initiation of a capture of second media content using the first camera (e.g., as illustrated in
In some embodiments, the first user interface includes a fourth media capture selectable interface object (e.g., 718 and/or X718) (in some embodiments, the fourth media capture selectable interface object is the same as the first, second, and/or third media capture selectable interface object) displayed at a first location (e.g., a center of the camera viewfinder) in the first user interface, the capture of the first media content using the first camera (e.g., of one or more cameras) includes a video capture, and initiating the capture of the first media content using the first camera includes displaying, via the display generation component, an animation of the fourth media capture selectable object (e.g., the concentric circles) moving towards a second location (e.g., as illustrated in
In some embodiments, at least a third portion of the first user interface (e.g., 710 and/or X710) overlays at least a portion of a field-of-view of the first camera (e.g., of one or more cameras), and wherein displaying the first user interface includes changing one or more visual features of the at least third portion of the user interface (e.g., the color, size, and/or weight of text, lines, and/or icons) based on (e.g., sampling background color to match UI and/or changing UI from light mode to dark mode) the at least portion of the field-of-view of the first camera. Changing the appearance of the UI based on the appearance of background content provides improved control of media capture, for example, by partially blending UI elements into the background content for a less obtrusive UI. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system detects a movement of a field-of-view of the first camera (e.g., of one or more cameras) in a first direction (e.g., a movement vector normalized to the plane of the media capture). In some embodiments, while detecting the movement of the field-of-view of the first camera in the first direction, the computer system moves a second portion of the first user interface (e.g., 712) away from an initial position of the second portion of the first user interface (e.g., the default/non-moving position of the UI portion when the camera movement begins) in a second direction opposite to the first direction (e.g., as illustrated in
In some embodiments, the first user interface includes one or more mode control user interface objects (e.g., 746) (e.g., a mode control affordance, such as a button, switch, and/or slider). In some embodiments, while displaying the one or more mode control user interface objects, the computer system detects an input directed toward the one or more mode control user interface objects, and, in response to detecting the input directed toward the one or more mode control user interface objects, switches (e.g., toggles or otherwise navigates) from a current capture mode to a different capture mode (e.g., toggling between a first capture mode and a second capture mode or selecting a capture mode from three or more capture modes based on a direction and/or magnitude of the input directed toward the one or more mode control user interface objects) (e.g., if the camera was in photo capture mode, the mode control affordance will toggle to video capture mode, and if the camera was in video capture mode, the mode control affordance will toggle to photo capture mode). Displaying a mode control affordance allowing the user to toggle between different capture modes provides improved control of media capture, for example, allowing a user to efficiently and intuitively switch between a photo capture mode and a video capture mode. Providing improved control of media capture assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured in an undesired format or mode.
In some embodiments, initiating capture of first media content using the first camera includes: in accordance with a determination that the computer system is configured to capture media using a first capture mode (in some embodiments, a photo capture mode is currently selected), capturing a type of media that corresponds to the first capture mode (in some embodiments, capturing the first media content includes capturing a photo); and in accordance with a determination that the computer system is configured to capture media using a second capture mode that is different from the first capture mode (in some embodiments, a video capture mode is currently selected), capturing a different type of media that corresponds to the second capture mode and is different from the type of media that corresponds to the first capture mode (e.g., as illustrated in FIGS. 7AA5-7AA6) (in some embodiments, capturing the first media content includes capturing a video). Capturing media in accordance with the current media capture mode provides improved control of media capture, for example, allowing a user to capture media of different types (e.g., photo or video) based on the current media capture mode state set using the mode control affordance. Providing improved control of media capture assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured in an undesired format or mode.
In some embodiments, the one or more mode control user interface objects (e.g., 746) include an element representing the first capture mode (e.g., 746A) (e.g., text, an icon, and/or a side of the mode control affordance representing the photo capture mode) and an element representing the second capture mode (e.g., 746B) (e.g., text, an icon, and/or a side of the mode control affordance representing the video capture mode). In some embodiments, displaying the mode control user interface object includes: in accordance with a determination that the computer system is configured to capture media using the first capture mode (in some embodiments, a photo capture mode is currently selected), visually emphasizing the element representing the first capture mode (e.g., 746A) in a respective manner (e.g., as illustrated in FIG. 7AA1) (e.g., displaying the element representing the first capture mode with a first appearance (e.g., with increased visual prominence; in some embodiments, the text is displayed with a particular color, such as yellow; in some embodiments, the element representing the first state is highlighted (e.g., underlaid by a backing platter, circled, and/or bordered); in some embodiments, the element representing the first state is displayed at a larger size than the element representing the second state) and displaying the element representing the second capture mode (e.g., 746B) with a second appearance (e.g., with decreased visual prominence; in some embodiments, the text is displayed with a different color, such as white; in some embodiments, the element representing the second state is not highlighted)). In some embodiments, displaying the mode control user interface object includes: in accordance with a determination that the computer system is configured to capture media using the second capture mode (in some embodiments, a video capture mode is currently selected), visually emphasizing the element representing the second capture mode in the respective manner (e.g., as illustrated in FIG. 7AA4) (e.g., displaying the element representing the first capture mode with the second appearance (e.g., with decreased visual prominence) and displaying the element representing the second capture mode with the first appearance (e.g., with increased visual prominence)). Changing the appearance of a mode control affordance to indicate the current state provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view, for example, by increasing the visual prominence of an element representing the currently-selected mode while decreasing the visual prominence of an element representing an unselected mode. Providing the user with improved visual feedback on the state of media capture assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed (e.g., due to initiating capture in an unintended state and/or an element of the UI obscuring the environment), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, in accordance with a determination that a first set of criteria are met, the computer system decreases a visual emphasis of (e.g., fading, blurring, desaturating, increasing a degree of transparency, and/or otherwise visually deemphasizing) a first set of one or more elements of the first user interface (e.g., 738, 746, and/or 750) (e.g., a dynamic fading UI; in some embodiments, the first set of one or more elements includes a mode control affordance, a photo well, and/or a video status indicator), wherein the first set of criteria includes a criterion that is met when the gaze (e.g., 732 and/or X732) of the user of the computer system is not directed to (e.g., the user is not looking at) the first set of one or more elements of the first user interface for over a threshold period of time (e.g., as illustrated in FIG. 7AA2) (e.g., 0.5 seconds, 1 second, and/or three seconds) (in some embodiments, the computer system does not increase the transparency (e.g., fade) the video status indicator when video is currently being captured). Increasing the transparency of portions of the media capture user interface after a threshold period of time without user attention provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, in accordance with a determination that a second set of criteria are met, the computer system increases a visual emphasis (e.g., unfading, unblurring, saturating, decreasing a degree of transparency, and/or otherwise visually emphasizing) of a second set of one or more elements of the first user interface (e.g., 738, 746, and/or 750) (e.g., a dynamic fading UI; in some embodiments, the second set of one or more elements includes a mode control affordance, a photo well, and/or a video status indicator; in some embodiments, the second set of one or more elements is the same as the first set of one or more elements), wherein the second set of criteria includes a criterion that is met when the gaze of the user of the computer system is directed to (e.g., the user looks at, points toward, or mores a selection indicator such as a cursor or a finger over) the second set of one or more elements of the first user interface (e.g., as illustrated in FIG. 7AA3). Decreasing the transparency of portions of the media capture user interface when the user looks at them provides improved control of media capture, e.g., using the user interface. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured with unintended settings, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, in response to detecting an event occurrence, the computer system displays a third set of one or more elements of the first user interface (e.g., 738, 746, and/or 750) (e.g., a mode control affordance, a photo well, and/or a video status indicator; in some embodiments, the third set of one or more elements is the same as the first set of one or more elements and/or the second set of one or more elements) with an increased degree of visual prominence (e.g., as illustrated in FIGS. 7AA1, 7AA4, 7AA6, and/or 7AA9) (e.g., unfading, unblurring, saturating, decreasing a degree of transparency, and/or otherwise visually emphasizing the third set of one or more elements of the first user interface). Displaying portions of the media capture user interface with a low degree of transparency (e.g., with relatively high visual prominence) in response to predetermined events provides improved control of media capture, e.g., using the user interface. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured with unintended settings, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, when a media capture application (e.g., the media capture application including the first user interface) is launched (e.g., opened or initially displayed after being hidden or closed), the third set of one or more elements of the first user interface are displayed with the increased degree of visual prominence before a degree of visual prominence of the third set of one or more elements of the first user interface decreases automatically to a lower degree of visual prominence (e.g., fading, blurring, desaturating, increasing a degree of transparency, and/or otherwise visually deemphasizing the third set of one or more elements of the first user interface) (in some embodiments, the third set of one or more elements includes the mode control affordance and photo well). Displaying portions of the media capture user interface with a low degree of transparency when a media capture application is initially launched provides improved control of media capture, e.g., using the user interface of the media capture application. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured with unintended settings, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the event occurrence includes a switch of (e.g., the user toggles) a current capture mode (e.g., a switch from photo to video mode or vis versa) between a third capture mode (in some embodiments, a photo capture mode; in some embodiments, the third capture mode is the same as the first capture mode) and a fourth capture mode that is different from the third capture mode (e.g., as illustrated in FIG. 7AA4) (in some embodiments, a video capture mode; in some embodiments, the fourth capture mode is the same as the second capture mode) (in some embodiments, the third set of one or more elements includes the mode control affordance). Displaying portions of the media capture user interface with a low degree of transparency when the user switches capture mode provides improved visual feedback on a state of the computer system when relevant without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured with unintended settings, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the event occurrence includes a media capture event (e.g., as illustrated in FIGS. 7AA6 and/or 7AA9) (e.g., the user starts, continues, and/or completes capturing a video and/or photo) (in some embodiments, the third set of one or more elements includes the video status indicator; in some embodiments, the video status indicator includes the current duration and/or elapsed time of video capture; in some embodiments, the video status indicator remains displayed with a high degree of visual prominence (e.g., opaque) for the duration of video capture). Displaying portions of the media capture user interface with a low degree of transparency when the user starts, continues, or completes video capture provides improved control of media capture, e.g., using the user interface of the media capture application. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or captured with unintended settings, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system displays a user interface object (e.g., 746) (e.g., a mode control affordance, such as a button, switch, and/or slider) that indicates a current media capture mode (e.g., the photo and/or video mode) (e.g., as illustrated in FIGS. 7AA1-7AA5 and/or 7AA9). In some embodiments, while capturing the first media content (e.g., while capturing video), the computer system displays status information about a media capture operation (e.g., 750) (e.g., a video status indicator, such as a button, icon, and/or text) (in some embodiments, the video status indicator includes the current duration and/or elapsed time of video capture) at a location in the first user interface that was previously occupied by the user interface object that indicated the current media capture mode (e.g., as illustrated in FIGS. 7AA6-7AA8) (e.g., ceasing display of the user interface object and/or replacing the mode control affordance with the video status indicator while actively recording video) (in some embodiments, the location is not a central location of the camera preview, e.g., the location is in a lower and/or upper region of the camera preview). Replacing a mode control affordance with a video status indicator while capturing video provides improved control of media capture using the user interface and provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, while capturing the first media content (e.g., while capturing video), the computer system displays an indication of (e.g., 750) (e.g., a video status indicator, such as a button, icon, and/or text) a status of capturing the first media content (in some embodiments, the video status indicator includes the current duration and/or elapsed time of video capture) (in some embodiments, the second region is different from the respective region and/or is not a central region of the camera preview, e.g., a lower and/or upper region of the camera preview; in some embodiments, the second region is the same as the first region). In some embodiments, while displaying the indication of the status of capturing the first media content, the computer system detects an input directed toward the indication of the status of capturing the first media content (e.g., as illustrated in FIG. 7AA8) (in some embodiments, input directed toward the indication of the status of capturing the first media content includes a respective user input (e.g., an activation or selection input that is used to trigger operations at the device when the input is detected) detected while the user's attention is directed to the indication of the status of capturing the first media content; in some embodiments, the respective input includes a press of a hardware button; in some embodiments, the respective input includes a gesture input, such as an air gesture (e.g., an air pinch); in some embodiments, the respective input includes a touch and/or tap input; in some embodiments, the respective input includes a speech input; in some embodiments, the respective input does not include location-based inputs, such as touch inputs on a touch-sensitive display or mouse clicks), and, in response to detecting the input directed toward the indication of the status of capturing the first media content, the computer system ceases capturing the first media content (e.g., as illustrated in FIG. 7AA9) (in some embodiments, the video recording stops and the capture of the first media content is completed; in some embodiments, the video recording pauses, and the user can re-start capture of the first media content, e.g., by selecting the video status indicator a second time). Displaying a video status indicator that functions as a pause/stop button provides improved control of video media capture and provides improved visual feedback on a state of the computer system without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment and/or captured unintentionally, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, aspects/operations of methods 800, 1000, and 1200 may be interchanged, substituted, and/or added between these methods. For example, the user interface displayed in method 800 can be the same as the user interface displayed in method 1000, and media capture can be initiated according to method 800 before, during, or after displaying the indicator representing the orientation of the field-of-view of the camera according to method 1000. For example, the camera viewfinder displayed in method 800 can be the same as the capture preview for spatial media displayed in method 1200, and media capture can be initiated according to method 800 before, during, or after displaying the prompt to change a distance between the subject and the cameras according to method 1200. For brevity, these details are not repeated here.
As illustrated in
At
In response to detecting the selection input (e.g., gaze input 906 and/or tap input 908), at
At
In response to detecting rotation 918A, computer system 700 moves and/or rotates a first set of elements of media capture user interface 710, including options affordance 720, first status indicator 722A, second status indicator 722B, and captured media icon 738, to align the first set of elements with orientation 904 (e.g., computer system 700 displays options affordance 720, first status indicator 722A, second status indicator 722B, and captured media icon 738 as environment-locked virtual objects). However, computer system 700 does not move and/or rotate a second set of elements of media capture user interface 710, including camera viewfinder 712, border 714, darkened area 716, and shutter affordance 718, to align with orientation 904, and accordingly, the second set of elements remain aligned with orientation 902 (e.g., computer system 700 displays camera viewfinder 712, border 714, darkened area 716, and shutter affordance 718 as viewpoint-locked virtual objects with respect to the viewpoint of fist camera 704A). As device 702 rotates as illustrated in
At
In some embodiments, the techniques and user interface(s) described in FIG. 9D1 are provided by one or more of the devices described in
At FIG. 9D2 (e.g., as in FIG. 9D1 following rotation 918B), the difference between orientation X902 of HMD X700 (represented by the solid x-y axis labeled X′-Y′) and orientation X904 of the environment (represented by the dotted x-y axis labeled X-Y) exceeds the first threshold (e.g., the tilt of HMD X700 exceeds 2°, 3°, 4°, 5°, or 10°). When HMD X700 is worn by a user in a head-mounted position, orientation X902 represents the orientation of the user's head and/or field-of-view. In response to the difference between orientation X902 and orientation X904 exceeding the first threshold, HMD X700 displays level indicator X920, indicating that HMD X700 is not currently level with respect to the horizon of the environment and that media captured at the current orientation X902 will appear tilted. Level indicator X920 is a broken line intersecting with shutter affordance X718, such that inner portion X920A falls inside the concentric rings of shutter affordance X718 and outer portion falls outside the concentric rings of shutter affordance X718 on either side. HMD X700 displays inner portion X920A as a viewpoint-locked virtual object, which includes aligning inner portion X920A parallel to the x-axis of orientation X902. HMD X700 displays outer portion X920B oriented based on the horizon of the environment, aligning outer portion X920B parallel to the x-axis of orientation X904 (e.g., outer portion X920B appears “level” with respect to the user's field-of-view).
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
Referring back to FIG. 9D1, computer system 700 detects rotation 918C of device 702, which is a further counterclockwise rotation. At
At
At
FIG. 9G1 illustrates an expanded view of level indicator 920 as orientation 902 of device 702 approaches orientation 904 of the environment (e.g., as orientation 902 approaches a level orientation), in accordance with some embodiments. As shown in the top drawing of FIG. 9G1, before the difference between orientation 902 and orientation 904 falls below the second threshold (e.g., 0.1°, 0.25°, 0.5°, or 1° tilt), computer system 700 displays inner portion 920A with one color (e.g., white, or another color) and the outer portion 920B with a different color (e.g., yellow, or another color). Additionally, inner portion 920A is displayed with a width less than the outer diameter of shutter affordance 718, such that, when the portions initially align (e.g., as shown in the middle drawing of FIG. 9G1), gaps are initially visible at the break points of the broken line.
As shown in the middle and bottom drawings of FIG. 9G1, in response to detecting that the difference between orientation 902 and orientation 904 has fallen below the second threshold (e.g., 0.1°, 0.25°, 0.5°, or 1° tilt), computer system 700 changes the appearance of level indicator 920 to cause the gaps between inner portion 920A and outer portion 920B to close, such that level indicator 920 appears as an unbroken line. Specifically, computer system 700 animates the two sides of outer portion 920B shifting inwards, towards the center of shutter affordance 718, without changing size (e.g., without stretching or expanding), and animates inner portion 920A expanding (e.g., stretches) outwards, away from the center of shutter affordance 718, until the edges of inner portion 920A and outer portion 920B meet (e.g., as shown in the bottom drawing of FIG. 9G1). Additionally, computer system 700 changes the color of inner portion 920A to match the color of outer portion 920B that level indicator 920 appears as single color.
After temporarily changing the color of level indicator 920, at
As described above with respect to
Additional descriptions regarding
The computer system displays (1002), via the display generation component (e.g., 708), a first user interface (e.g., 710 and/or X710) (e.g., a camera/capture UI) that includes a camera preview (e.g., 712 and/or X712) (e.g., a viewfinder/camera preview object, such as an object framing/encompassing a region for media capture; in some embodiments, overlaying at least a portion of an environment via a transparent display, pass-through camera data and/or virtual content (in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment)) of at least a portion of a field-of-view of the first camera.
The computer system detects (1004) (in some embodiments, while the field-of-view of the camera is oriented in a predetermined (e.g., level) orientation) a change (e.g., 918A, 918B, 918C, 918D, and/or 918E) in an orientation (e.g., 902 and/or X902) (e.g., a change in the tilt level of the camera with respect to the horizon (e.g., an absolute difference in angle measured between the x-axis of the xy-plane of the field-of-view of the camera and a horizon line (e.g., target level)); in some embodiments, the tilt level can be measured using a relative angle between stereo lenses/multiple cameras) of the field-of-view of the first camera (in some embodiments, the field-of-view of the camera is viewpoint-locked (e.g., head-locked, display-locked, and/or device-locked)) with respect to a respective orientation (e.g., 904 and/or X704) (e.g., a level orientation representing the orientation of the environment, such as where the camera is not tilted with respect to the horizon line (e.g., the x-axis of the xy-plane of the field-of-view of the camera is aligned with the horizon line/target level) or with respect to the direction of gravity's pull) (in some embodiments, a predetermined orientation).
The computer system, in response to detecting the change in the orientation (1006) and in accordance with a determination that a first set of criteria are met, displays (1008) (e.g., initially displaying) a first indicator (e.g., 920 and/or X920) representing the orientation (e.g., 902 and/or X902) (e.g., a level indicator, such as a broken line with an inner portion and outer portion(s); in some embodiments, the level indicator indicates both the orientation of the UI and the horizon/level line that the orientation is measured with respect to) of the field-of-view of the first camera, wherein the first set of criteria includes a first criterion that is met when a difference between a current orientation of the field-of-view of the first camera and the respective orientation exceeds a first threshold amount (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is greater than, e.g., 2°, 30, and/or 4°, begin displaying the level indicator; in some embodiments, if the level indicator is already being displayed, display continues until the relative angle falls below a different threshold (e.g., 0.5°, 10, 2°)) (in some embodiments, in accordance with a determination that the first set of criteria are not met (e.g., the camera remains level or close to level after detecting the change in orientation), foregoing displaying the first indicator representing the orientation). Displaying a level indicator when the orientation of a field-of-view of a camera exceeds a first threshold difference from a level orientation (e.g., the level of a horizon line in an environment) provides a user with real-time visual feedback about a state of the computer system. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured (e.g., due to misalignment of the system), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the initial display of the level indicator indicates to a user that media captured at the instant camera orientation will be visibly tilted with respect to the horizon, allowing the user to adjust the orientation if a level media capture is desired.
In some embodiments, while displaying the first indicator (e.g., 920 and/or X920) representing the orientation of the field-of-view of the first camera (e.g., once the level indicator has been displayed, indicating that the current orientation is not level), the computer system detects a second change (e.g., 918A, 918B, 918C, 918D, and/or 918E) in the orientation (e.g., 902) of the field-of-view of the first camera with respect to the respective orientation (e.g., 904 and/or X904) (e.g., as the user adjusts the orientation of the camera to increase or decrease the tilt). In some embodiments, in response to detecting the second change in the orientation and in accordance with a determination that a second set of criteria are met, the computer system ceases to display the first indicator, wherein the second set of criteria includes a second criterion that is met when the difference between the current orientation of the field-of-view of the first camera and the respective orientation is less than a second threshold amount (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is less than, e.g., 0.5°, 10, and/or 2° (e.g., when the camera is level or close to level), stop displaying the indicator) (in some embodiments, the second threshold amount is the first threshold amount; in some embodiments, the second set of criteria are met when the first set of criteria are not met). Ceasing to display the level indicator when the orientation of a field-of-view of a camera is level or close to level provides the user with improved visual feedback about the orientation of the field-of-view of the camera. For example, ceasing to display the level indicator indicates to the user that the user has corrected the tilt of the camera, and that media captured at the instant camera orientation will not be significantly visibly tilted with respect to the horizon. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the second threshold amount is different from the first threshold amount (e.g., as illustrated in FIGS. 9D1, 9D2, and 9F) (e.g., displaying the level indicator at 5° and ceasing to display it at 1°, displaying the level indicator at 5° and ceasing to display it at 2°, displaying the level indicator at 4° and ceasing to display it at 2°, and/or displaying the level indicator at 1° and ceasing to display it at 0.5°; in some embodiments, the second threshold amount is less than the first threshold amount). Ceasing to display the level indicator at a different threshold orientation than the threshold orientation at which the level indicator is initially displayed provides improved visual feedback by preventing the level indicator from flickering (e.g., bouncing) on and off as very small changes in the orientation of the camera are detected, e.g., due to slight movement of the user's body. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first user interface (e.g., 710 and/or X710) includes a first visual indication (e.g., 712, X712, 716, and/or X716) (e.g., a transition or gradient to darkening and/or blurring) along a first edge of the camera preview (e.g., the border/frame of the capture region; in some embodiments, along all edges of the capture preview). In some embodiments, the first visual indication modifies (e.g., vignettes (e.g., blurs and/or darkens)) a visual appearance of a second portion of a representation of the field-of-view of the first camera that underlays the first visual indication. In some embodiments, the first visual indication is viewpoint-locked (in some embodiments, the camera preview (e.g., viewfinder region) is viewpoint-locked, and the first visual indication frames the camera preview). Displaying a viewpoint-locked visual indication along the edge of the camera preview provides a user with improved visual feedback about a state of the computer system (e.g., with respect to the current framing of media capture). Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, a first portion (e.g., 920A and/or X920A) (e.g., the inner portion of a broken line) of the first indicator (e.g., 920 and/or X920) representing the orientation of the field-of-view of the first camera is displayed inside a first region of the first user interface (e.g., 718 and/or X718) (e.g., the area falling within the capture affordance rings), and a second portion (e.g., 920B and/or X920B) of the first indicator (e.g., the outer portion(s) of the broken line) representing the orientation of the field-of-view of the first camera is displayed outside the first region of the first user interface (e.g., the first indicator intersects the capture affordance rings). Displaying the level indicator intersecting with a particular region of the user interface, such as a capture affordance, provides improved visual feedback on a state of the computer system (e.g., with respect to the orientation/alignment of media capture) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first portion (e.g., 920A and/or X920A) (e.g., the inner portion of the broken line) of the first indicator is displayed with an orientation that is maintained at a fixed orientation relative to a viewpoint of the user (e.g., 902 and/or X902) (e.g., displayed at an orientation aligned with (e.g., parallel or perpendicular to) an orthogonal axis (e.g., an x-axis, or a y-axis) of the field-of-view of the first camera; in some embodiments, for a wearable device, the first portion of the first indicator is level to the user's head). Displaying a portion of the level indicator in alignment with the camera field-of-view (e.g., level with the camera/device/head) provides a user with improved visual feedback about a state of the computer system (e.g., with respect to the current/real-time orientation/alignment of media capture). Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the second portion (e.g., 920B and/or X920B) of the first indicator (e.g., the outer portion(s) of the broken line) is displayed with an orientation that is maintained at a fixed orientation relative to one or more portions of a three-dimensional environment (e.g., 904 and/or X904) (e.g., displayed at an orientation aligned with (e.g., parallel or perpendicular to) an orthogonal axis (e.g., an x-axis (such as a horizon line) or a y-axis such as the direction of gravity's pull) of the respective (e.g., target) orientation (e.g., the orientation representing the orientation of the environment (in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment); in some embodiments, the second orthogonal axis of the predetermined orientation corresponds to the first orthogonal axis of the field-of-view of the first camera, so the level indicator compares, e.g., the x-axes of both the camera and the environment)). Displaying a portion of the level indicator in alignment with the environment (e.g., level with the true horizon, perpendicular to the direction of gravity) provides a user with improved visual feedback about a state of the computer system (e.g., with respect to the current/real-time orientation/alignment of media capture). Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, after displaying the first indicator representing the orientation of the field-of-view of the first camera (e.g., once the level indicator has been displayed, indicating that the current orientation is not level), the computer system detects a third change (e.g., 918A, 918B, 918C, 918D, and/or 918E) in the orientation of the field-of-view of the first camera with respect to the respective orientation (e.g., as the user adjusts the orientation of the camera to increase or decrease the tilt). In some embodiments, in response to detecting the third change in the orientation and in accordance with a determination that the difference between the current orientation of the field-of-view of the first camera and the respective orientation has increased (e.g., the camera has been tilted further away from level), the computer system increases (e.g., rotating the outer portion relative to the inner portion so the outer portion remains aligned with the respective orientation (e.g., level to the horizon)) an angle between the first portion and the second portion of the first indicator (e.g., as illustrated in FIGS. 9D1-9E). Increasing an angle between the two portions of the indicator as the camera orientation departs further from the target orientation provides the user provides a user with improved visual feedback about a state of the computer system (e.g., with respect to the current/real-time orientation/alignment of media capture). Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, while displaying the first indicator and in accordance with a determination that the first set of criteria are met (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is greater than, e.g., 2°, 3°, and/or 4°), the computer system displays the first portion (e.g., the inner portion of the broken line) of the first indicator with a first color (e.g., white, or another color), and displays the second portion (e.g., the outer portion of the broken line) of the first indicator with a second color (e.g., yellow, or another color) different from the first color (e.g., as illustrated at the top and middle of FIG. 9G1). In some embodiments, in accordance with a determination that a third set of criteria are met, the computer system displays both the first portion and the second portion with a third color (e.g., as illustrated at the bottom of FIG. 9G1) (e.g., display all of the broken line portions with the same color; in some embodiments, the third color is the same as the first color (e.g., the completed line turns white); in some embodiments, the third color is the same as the second color (e.g., the completed line turns yellow); in some embodiments, the third color is a different color from both the first color and the second color (e.g., the completed line turns red)), wherein the third set of criteria includes a third criterion that is met when the difference between the current orientation of the field-of-view of the first camera and the respective orientation is less than a third threshold amount (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is less than, e.g., 0.05°, 0.25°, and/or 1° (e.g., when the camera is level or close to level)) (in some embodiments, the third set of criteria is different from the second set of criteria, the third criterion is different from the second criterion, and/or the third threshold amount is different from the second threshold amount; in some embodiments, the third threshold amount is less than the second threshold amount; in some embodiments, the third set of criteria is the same as the second set of criteria, the third criterion is the same as the second criterion, and/or the third threshold amount is the same as the second threshold amount) (in some embodiments, displaying the broken line portions with the same color is performed prior to ceasing to display the level indicator). Displaying the portions of the indicator with different colors when the orientation of a field-of-view of a camera exceeds a first threshold difference from a level orientation (e.g., the level of a horizon line in an environment) and displaying the portions of the indicator with the same color when the orientation has been leveled out provides a user with real-time visual feedback about a state of the computer system. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed and/or captured in a manner (e.g., due to misalignment of the system) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first region of the first user interface is a circular region with a first diameter. In some embodiments, while displaying the first indicator and in accordance with a determination that the first set of criteria are met (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is greater than, e.g., 2°, 3°, and/or 4°), the computer system displays the first portion (e.g., the inner portion of the broken line) of the first indicator with a width smaller than the first diameter (e.g., the inner portion of the broken line does not extend all the way to the edges of the capture affordance region, leaving a gap between the edges of the portion and the outer edges of the capture affordance), and displays the second portion (e.g., the outer portion of the broken line) of the first indicator as two or more separate elements displayed on different sides of the first portion (e.g., a third portion (e.g., the left-hand portion of the broken line), displayed on a first side of the first region, and a fourth portion (e.g., the right-hand portion of the broken line), displayed on a second side of the first region opposite the first side), wherein an inner edge of a first element of the two or more elements and an inner edge of a second element of the two or more elements are spaced apart from the first portion of the first indicator (e.g., the third portion and the fourth portion touch an outer boundary of the first region) (e.g., as illustrated at the top and middle of FIG. 9G1). In some embodiments, in accordance with a determination that a fourth set of criteria are met, the computer system displays the inner edge of the first element and the inner edge of the second element shifting toward the first portion (e.g., moving to touch the first portion, such as by shifting into the first region to touch the first portion) (e.g., as illustrated at the bottom of FIG. 9G1) (e.g., the broken line portions animate to snap (e.g., by moving and/or expanding) together to create a completed line; in some embodiments, the inner portion expands in width (e.g., changes in size to stretch outwards towards the edges of the capture affordance); in some embodiments, the outer portions of the broken line shift inward without expanding/changing size; in some embodiments, the outer portions of the broken line expand in width (e.g., change in size to stretch inwards towards the edges of the inner portion)), wherein the fourth set of criteria includes a fourth criterion that is met when the difference between the current orientation of the field-of-view of the first camera and the respective orientation is less than a fourth threshold amount (e.g., when the absolute difference between the x-axis of the plane of the field-of-view of the camera and the horizon line is less than, e.g., 0.05°, 0.25°, and/or 1° (e.g., when the camera is level or close to level)) (in some embodiments, the fourth set of criteria is the same as the third set of criteria, the fourth criterion is the same as the third criterion, and/or the fourth threshold amount is the same as the third threshold amount) (in some embodiments, displaying the broken line portions with the same color is performed prior to ceasing to display the level indicator). Displaying the indicator with a gap between the inner and outer portions (e.g., as a broken line with gaps at the “break” points) when the orientation of a field-of-view of a camera exceeds a first threshold difference from a level orientation (e.g., the level of a horizon line in an environment) and displaying the portions of the indicator “snapping” together (e.g., to create a completed line) when the orientation has been leveled out provides a user with real-time visual feedback about a state of the computer system. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed and/or captured in a manner (e.g., due to misalignment of the system) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, displaying the first user interface includes displaying a media capture selectable interface object (e.g., 718 and/or X718) (e.g., concentric rings displayed at the center of the capture preview region), and displaying the first indicator representing the orientation of the field-of-view of the first camera is performed while displaying the media capture selectable interface object (e.g., the level indicator is displayed concurrently with the media capture affordance). Displaying the level indicator concurrently with a capture affordance provides the user with improved visual feedback on a state of the computer system, for example, both the orientation and the capture state of a camera system. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first indicator (e.g., 920 and/or X920) representing the orientation of the field-of-view of the first camera is visually incorporated into (e.g., displayed as a part of; in some embodiments, the level indicator intersects or overlaps the capture affordance) the media capture selectable interface object (e.g., 718 and/or X718). Incorporating the level indicator into the media capture affordance provides improved visual feedback on a state of the computer system (e.g., with respect to the orientation/alignment of media capture) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, at least a portion (e.g., 920A and/or X920A) of the first indicator (e.g., 920 and/or X920) overlaps (e.g., intersects with, runs through) at least a portion of the media capture selectable interface object (e.g., 718 and/or X718) (e.g., the level indicator is a broken line, where the inner portion intersecting the media capture affordance moves independently of the two outer portions that don't intersect the media capture affordance). Incorporating the level indicator into the media capture affordance provides improved visual feedback on a state of the computer system (e.g., with respect to the orientation/alignment of media capture) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system detects a respective input (e.g., 926) (in some embodiments, a user input for media capture, such as a hardware button press or an air pinch gesture; in some embodiments, a gaze input, such as user attention focused on the center of the capture preview) (in some embodiments, the respective input is the same respective input as described with respect to method 800). In some embodiments, the computer system changes one or more visual features of the media capture selectable interface object (e.g., 718 and/or X718) (e.g., changing the opacity of the concentric rings and/or changing the size of the concentric rings (e.g., squeezing together)) based on the respective input (e.g., as illustrated in
In some embodiments, the computer system displays, via the display generation component, a representation (e.g., 738) of first media content captured by the first camera, and the representation is displayed at an orientation aligned with (e.g., parallel or perpendicular to) a first orthogonal axis (e.g., an x-axis (such as a horizon line), a y-axis such as the direction of gravity's pull) of the respective (e.g., target) orientation (e.g., 904 and/or X904) (e.g., the orientation representing the orientation of the environment (in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment)). Displaying the photo well icon aligned with the environment provides improved visual feedback on a state of the computer system to the user (e.g., with respect to the current orientation/alignment of media capture with respect to the environment). Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the change in the orientation (e.g., 902 and/or X902) of the field-of-view of the first camera with respect to the respective orientation (e.g., 904 and/or X904) is caused by (in some embodiments, the camera orientation is head- or body-locked, e.g., for a wearable device; in some embodiments, the camera is physically controlled by the user, e.g., a device being held by a user) a change in an orientation of at least a portion of a body of a user of the computer system (e.g., the user's head, arms, hands, and/or entire body).
In some embodiments, the computer system is further in communication with a second camera (e.g., 704B) that is spaced apart from the first camera (e.g., 704A) (in some embodiments, the computer system is in communication with two spaced-apart cameras substantially facing the same direction; in some embodiments, a camera array for spatial media capture). In some embodiments, the first set of criteria is met when an orientation of a line formed between the first camera and the second camera is substantially parallel to (e.g., within 0.5°, 1°, or 2° of) a second orthogonal axis (e.g., an x-axis (such as a horizon line); in some embodiments, the second orthogonal axis is the same as the first orthogonal axis) of the respective (e.g., target) orientation (e.g., 904 and/or X904) (e.g., the orientation representing the orientation of the environment (in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment)).
In some embodiments, the first user interface includes an options affordance (e.g., 720 and/or X720) (e.g., a button with “ . . . ” at the bottom of the camera UI), which, when selected, causes display of one or more media capture option selectable interface objects (e.g., 912A and/or 912B) (e.g., level indicator, animated photo mode, timer delay, and/or flash). Providing an options affordance for accessing additional camera options provides improved control of media capture and improved visual feedback on the current state of media capture options. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured (e.g., due to inappropriate or unwanted camera settings), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first set of criteria includes a second criterion that is met when a first media capture setting (e.g., the level indicator setting), controllable by a first media capture option selectable interface object (e.g., 912B) of the one or more media capture option selectable interface objects, is in a first state (e.g., if the level indicator is turned on). Providing a media capture option affordance for toggling the level indicator provides improved control of media capture. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or and/or captured in a manner (e.g., due to misalignment of the system during capture) that can negatively affect later viewing of the media (e.g., by impacting spatial media playback and/or causing user discomfort when viewed via an HMD), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the one or more media capture option selectable interface objects includes a photo mode selectable interface object (e.g., 912A) for toggling (e.g., enabling or disabling) a multi-frame photo capture mode (e.g., a dynamic photo mode; in some embodiments, when the multi-frame photo capture mode is enabled, capture of photo media includes capturing multiple frames, and when the multi-frame photo capture mode is disabled, capture of photo media includes capturing only a single frame). Providing a media capture option affordance for toggling the multi-frame photo capture mode provides improved control of media capture. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured (e.g., due to inappropriate or unwanted camera settings), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first user interface includes a first status indicator (e.g., 722A, X722A, 722B and/or X722B) that indicates a current state (e.g., the current state of the options (e.g., enabled/disabled)) of a second media capture setting (e.g., a media capture setting and/or parameter; in some embodiments, the second media capture setting is the same as the first media capture setting and/or the multi-frame photo capture mode) that corresponds to (e.g., that can be configured by) a second media capture option selectable interface object of the one or more media capture option selectable interface objects (in some embodiments, the second media capture option selectable interface object is the same as the first media capture option selectable interface object and/or the photo mode selectable interface object). Providing a status indicator provides improved feedback on a state of the computer system (e.g., with respect to the current state of capture settings), for example, indicating to the user whether the level indicator isn't being displayed because the camera is level or because the level indicator is disabled. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured (e.g., due to inappropriate or unwanted camera settings), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, aspects/operations of methods 800, 1000, and 1200 may be interchanged, substituted, and/or added between these methods. For example, the user interface displayed in method 1000 can be the same as the user interface displayed in method 800, and the indicator representing the orientation of the field-of-view of the camera can be displayed according to method 1000 before, during, or after initiating media capture according to method 800. For example, the camera preview displayed in method 1000 can be the same as the capture preview for spatial media displayed in method 1200, and the indicator representing the orientation of the field-of-view of the camera(s) can be displayed according to method 1000 before, concurrently with, or after displaying the prompt to change a distance between the subject and the cameras according to method 1200. For brevity, these details are not repeated here.
FIGS. 11A1-11H illustrate exemplary methods for displaying a camera preview for spatial media capture with prompts to improve capture quality.
As illustrated in FIG. 11A1, computer system 700 is displaying, via display 708 of device 702, media capture user interface 710 (e.g., as described above with respect to
When user 1104 holds device 702 as illustrated in FIG. 11A1, the respective fields-of-view of first camera 704A and second camera 704B each include at least part of subject 1106 (a table with a plant on top), and do not include subject 1108 (a lamp). Accordingly, media capture user interface 710 includes representation 1106A of subject 1106 (e.g., within the representation of the field-of-view of first camera 704A and/or the field-of-view of second camera 704B overlaid by media capture user interface 710).
At FIG. 11A1, computer system 700 determines that the current positioning of subject 1106 within the fields-of-view of first camera 704A and second camera 704B would result in a spatial media capture with below-threshold quality, as subject 1106 is too close to first camera 704A and second camera 704B. At the current distance from device 702, the capture of subject 1106 in the fields-of-view of first camera 704A and second camera 704B would not overlap sufficiently to produce a threshold level of quality appearance/illusion of depth in the spatial media capture. In some embodiments where computer system 700 is implemented using a head-mounted device, the distance between device 702 and subject 1106 is approximately the same as the distance between the user's eyes and subject 1106, but while the viewpoint of subject 1106 may appear fine to the user (e.g., the user is able to focus on subject 1106), the viewpoint of the first camera 704A and second camera 704B may result in a lower-quality spatial media capture, as the field-of-view and relative positioning of the cameras may not exactly match that of the user's eyes.
In response to determining that the current positioning of subject 1106 within the fields-of-view of first camera 704A and second camera 704B would result in a spatial media capture with below-threshold quality, computer system 700 prompts user 1104 to change positioning to address the quality issue with the spatial media capture. In particular, computer system 700 displays text prompt 1110, which includes the text “move farther away.” Additionally, computer system 700 changes the appearance of shutter affordance 718 to obscure (e.g., blur and/or darken) the portion of the representation of the field-of-view of first camera 704A and/or the field-of-view of second camera 704B currently overlaid (e.g., at least semi-translucently) by shutter affordance 718. At FIG. 11A1, computer system 700 obscures the overlaid portion to a first extent.
In some embodiments, the techniques and user interface(s) described in FIG. 11A1 are provided by one or more of the devices described in
As illustrated in FIG. 11A2, HMD X700 is displaying, via display module X702, media capture user interface X710 (e.g., as described above with respect to
When user X1104 wears HMD X700 as illustrated in FIG. 11A2 (e.g., in a head-mounted position), the respective fields-of-view of the at least two cameras each include at least part of subject X1106 (a table with a plant on top), and do not include subject X1108 (a lamp). Accordingly, media capture user interface X710 includes representation X1106A of subject X1106 (e.g., within the representation of the at least two cameras overlaid by media capture user interface X710).
At FIG. 11A2, HMD X700 determines that the current positioning of subject X1106 within the fields-of-view of the at least two cameras would result in a spatial media capture with below-threshold quality, as subject X1106 is too close to the at least two cameras. At the current distance from HMD X700, the capture of subject X1106 in the at least two cameras would not overlap sufficiently to produce a threshold level of quality appearance/illusion of depth in the spatial media capture. In some embodiments, the distance between HMD X700 and subject X1106 is approximately the same as the distance between the user's eyes and subject X1106, but while the viewpoint of subject X1106 may appear fine to the user (e.g., the user is able to focus on subject X1106), the viewpoint of the at least two cameras may result in a lower-quality spatial media capture, as the field-of-view and relative positioning of the cameras may not exactly match that of the user's eyes.
In response to determining that the current positioning of subject X1106 within the fields-of-view of the at least two cameras would result in a spatial media capture with below-threshold quality, HMD X700 prompts user X1104 to change positioning to address the quality issue with the spatial media capture. In particular, HMD X700 displays text prompt X1110, which includes the text “move farther away.” Additionally, HMD X700 changes the appearance of shutter affordance X718 to obscure (e.g., blur and/or darken) the portion of the representation of the field-of-view of the at least two cameras currently overlaid (e.g., at least semi-translucently) by shutter affordance X718. At FIG. 11A2, HMD X700 obscures the overlaid portion to a first extent.
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
In response to determining that the positioning of subject 1106 within the fields-of-view of first camera 704A and second camera 704B following movement 1112A would still result in a spatial media capture with below-threshold quality, computer system continues to display text prompt 1110 including the text “move farther away.” Additionally, computer system 700 continues to obscure the portion of the representation of the field-of-view of first camera 704A and/or the field-of-view of second camera 704B currently overlaid by shutter affordance 718. However, as movement 1112A increased the distance to subject 1106 at
Computer system detects movement 1112B of device 702, as user 1104 moves further backwards, away from subject 1106 in environment 1100. At
In response to determining that the current positioning of subject 1106 within the fields-of-view of first camera 704A and second camera 704B would result in a spatial media capture that meets the threshold quality, at
Computer system 700 detects movement 1112C of device 702, as user 1104 rotates device 704 to point away from subject 1106 and towards subject 1108. At
In response to determining that the current positioning of subject 1108 within the fields-of-view of first camera 704A and second camera 704B would result in a spatial media capture with below-threshold quality, computer system 700 again prompts user 1104 to change positioning to address the quality issue with the spatial media capture, similarly to the prompting described with respect to FIGS. 11A1-11A2. In
Computer system detects movement 1112D of device 702, as user 1104 moves forward, towards subject 1108 in environment 1100. At
As described above with respect to
Additional descriptions regarding FIGS. 11A1-11H are provided below in reference to method 1200 described with respect to
The computer system displays (1202), via the display generation component (e.g., 708 and/or X702), a capture preview (e.g., 712 and/or X712) (e.g., a camera/capture preview UI; in some embodiments, overlaying a field of view of an environment, such as a transparent display, pass-through camera data and/or virtual content; in some embodiments, a physical environment; in some embodiments, a virtual environment; in some embodiments, a mixed-reality environment) for spatial media capture (e.g., a single media capture made using the combined data captured by the first camera (e.g., from the first perspective/capturing the first field of view) and the second camera (e.g., from the second perspective/capturing the second field of view)), wherein a capture input detected while the capture preview is displayed will cause the computer system to capture media from the first camera (e.g., 704A) and the second camera (e.g., 704B) to generate a spatial media item that includes one or more images for a right eye and one or more images for a left eye that when viewed concurrently create an illusion of a spatial representation of a field-of-view of the plurality of cameras.
While displaying the capture preview for spatial media capture, the computer system detects (1204) a location of a subject (e.g., 1106, X1106, 1108 and/or X1108) (e.g., an element of the environment; in some embodiments, the closest visible element of the environment to the plane of the capture) in the field-of-view of the plurality of cameras.
In response to detecting the location of the subject in the field-of-view of the plurality of cameras (1206), in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras does not meet criteria for capturing spatial media with a threshold level of quality (e.g., if the distance between the cameras and an element of the environment would adversely affect the quality of a current spatial media capture; e.g., if the cameras are too close to an element of the environment, the capture of the element by each camera will differ too greatly to provide a realistic illusion/appearance of depth, and if the cameras are too far from the element, the capture of the element by each camera will differ too little to provide a realistic illusion/appearance of depth), the computer system displays (1208), via the display generation component, a prompt (e.g., 1110, X1110, and/or 1114) (e.g., text, graphics, or another displayed output including instructions to the user) to change a distance between the subject and the plurality of cameras (e.g., “move farther away,” or “move closer,”; in some embodiments, the prompt includes an indication of the quality improvement criteria (e.g., “move farther away to improve spatial media capture quality”)). In some embodiments, if an orientation of the plurality of cameras relative to the subject would adversely affect the quality of a current spatial media capture (e.g., as an off-axis spatial media capture would require the user to tilt their head to the angle of the capture axis to avoid visual discomfort), the computer system displays a prompt to change an orientation of the plurality of cameras relative to the subject (e.g., such that media is captured with a level horizon so that the spatial media can be viewed comfortably with a level horizon). Displaying a prompt to change a distance between a subject and the camera provides a user with real-time, improved visual feedback about a state of the computer system. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured (e.g., due to the subject being too close to or too far from the camera for effective spatial media capture), which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the display of the prompt indicates to the user that the current (e.g., real-time) positioning of the subject with respect to the camera may adversely affect spatial media capture quality, and may assist the user in changing the positioning for improved spatial media capture quality.
In some embodiments, the prompt (e.g., 1110, X1110, and/or 1114) to change the distance between the subject and the plurality of cameras includes text (in some embodiments, including a written instruction, such as “move farther away” or “move closer”; in some embodiments, including other text, such as an explanation for the prompt (e.g., “move farther away to improve spatial media capture quality”)). Displaying a text prompt provides the user with improved visual feedback on a state of the computer system, for example, indicating that the current positioning of the camera will adversely affect media capture quality. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, in response to detecting the location of the subject (e.g., 1106, X1106, 1108, and/or X1108) in the field-of-view of the plurality of cameras, in accordance with a determination that the subject location relative to the field-of-view of the plurality of cameras meets the criteria for capturing spatial media with the threshold level of quality, the computer system foregoes displaying the prompt to change the distance between the subject and the plurality of cameras (e.g., as illustrated in
In some embodiments, the capture preview for spatial media capture includes a media capture selectable interface object (e.g., 718 and/or X718) (e.g., concentric rings at the center of the capture preview), and displaying the prompt to change the distance between the subject and the plurality of cameras includes making a first change to one or more visual features of the media capture selectable interface object (e.g., as illustrated in FIGS. 11A1-11C and 11E-11F) (e.g., color, opacity, and/or size). Prompting the change in distance by altering the appearance of the capture affordance provides the user with intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the media capture selectable interface object (e.g., 718 and/or X718) is displayed in a central region (e.g., a region that includes a center) of the capture preview for spatial media capture. Displaying the capture affordance at the center of the capture preview provides the user with intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, making the first change to the one or more visual features of the media capture selectable interface object (e.g., 718 and/or X718) includes changing a first portion (e.g., the concentric rings) of the media capture selectable interface object from a first color to a second color (e.g., as illustrated in
In some embodiments, while the media capture selectable interface object (e.g., 718) is displayed with the second color, the computer system changes the first portion of the media capture selectable interface object from the second color to the first color over a period of time (e.g., gradually fading back to the original color after providing the feedback; in some embodiments, changing back to the default color when the camera is an appropriate distance from the subject). Reverting a change of the color of the capture affordance provides the user with intuitive, improved visual feedback on a state of the computer system (e.g., indicating that the prompt to change distance has been successfully completed) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the visual feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, prior to displaying the prompt to change the distance between the subject and the plurality of cameras, a first portion of the media capture selectable object (e.g., 718 and/or X718) is partially transparent such that a first portion of the camera preview for the spatial media capture is visible through the first portion of the media capture selectable interface object (e.g., as illustrated in
In some embodiments, causing the first portion of the camera preview to be at least partially obscured includes blurring the first portion of the camera preview (in some embodiments, the blurring is applied gradually). Prompting the change in distance by blurring the portion of the camera preview inside the capture affordance provides intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, causing the first portion of the camera preview to be at least partially obscured includes darkening the first portion of the camera preview (in some embodiments, the darkening is applied gradually). Prompting the change in distance by blurring the portion of the camera preview inside the capture affordance provides intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, after causing the first portion of the camera preview to be at least partially obscured (e.g., as illustrated in FIG. 11A1-11A2), the computer system detects a change (e.g., 1112A) to the location of the subject in the field-of-view of the plurality of cameras (e.g., detecting movement of the camera with respect to the elements of the environment). In some embodiments, in response to detecting the change to the location of the subject in the field-of-view of the plurality of cameras, the computer system makes a second change to the one or more visual features of the media capture selectable interface object (e.g., as illustrated in
In some embodiments, making the second change to the one or more visual features of the media capture selectable interface object (e.g., 718 and/or X718) is based on a direction of the change to the location of the subject (e.g., if the change to the location moved the camera closer to a target distance range, obscuring the camera preview less and if the change to the location moved the camera farther from the target distance range, obscuring the camera preview more). Changing the appearance of the capture affordance based on the direction of the change of subject location provides intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, making the second change to the one or more visual features of the media capture selectable interface object (e.g., 718 and/or X718) is based on a magnitude of the change to the location of the subject (e.g., if the change to the location moved the camera only slightly away from or towards a target distance range (e.g., a relatively small magnitude of change), the change (e.g., increase or decrease, respectively) to the obscuring is also slight; and if the change to the location moved the camera more significantly away from or towards the target distance range, the change to the obscuring is more significant). Changing the appearance of the capture affordance based on the magnitude of the change of subject location provides intuitive, improved visual feedback on a state of the computer system (e.g., with respect to current spatial media capture quality implications) without excessively obscuring the field-of-view. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed due to an element of the UI obscuring the environment or due to the user not seeing the feedback, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the prompt to change the distance between the subject and the plurality of cameras includes a prompt to reduce the distance between the subject and the plurality of cameras (e.g., 1114) (e.g., “move closer,” or “move forward”). Displaying a prompt to reduce the subject distance provides the user with improved visual feedback on a state of the computer system, for example, indicating how to change the current positioning of the camera to improve media capture quality. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the prompt to change the distance between the subject and the plurality of cameras includes a prompt (e.g., 1110 and/or X1110) to increase the distance between the subject and the plurality of cameras (e.g., “move farther away,” or “back up”). Displaying a prompt to increase the subject distance provides the user with improved visual feedback on a state of the computer system, for example, indicating how to change the current positioning of the camera to improve media capture quality. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system detecting a movement (e.g., 1112A, 1112B, 1112C, and/or 1112D) of a user (e.g., 1104 and/or 1104) of the computer system in a physical environment (e.g., 1110 and/or X1110). In some embodiments, detecting the location of the subject (e.g., 1106, X1106, 1108, and/or X1108) in the field-of-view of the plurality of cameras is performed in response to detecting the movement of the user (e.g., when the user moves in or around a physical environment, checking the location of the subject (e.g., the current closest subject) and, if appropriate, displaying the prompt). Displaying a prompt based on the physical movement of the user provides the user with improved visual feedback on a state of the computer system, for example, indicating if and/or how the movement of the user affected the capture quality. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system detects a change to the field-of-view of the plurality of cameras. In some embodiments, detecting the location of the subject (e.g., 1106, X1106, 1108, and/or X1108) in the field-of-view of the plurality of cameras is performed in response to detecting the change to the field-of-view of the plurality of cameras (e.g., when the camera viewpoint changes, checking the location of the subject (e.g., the current closest subject) and, if appropriate, displaying the prompt). Displaying a prompt based on the movement of the camera viewpoint provides the user with improved visual feedback on a state of the computer system, for example, indicating how the new capture framing affects the capture quality. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, the first camera (e.g., 704A) and second camera (e.g., 704B) generate the spatial media item by capturing at least content within a first set of one or more planes of capture (e.g., planes that are substantially perpendicular to the principal axes of the cameras and/or parallel to outward facing lenses of the cameras) that are at least partially within a field-of-view of the first camera and a field-of-view of the second camera. In some embodiments, the first set of one more planes of capture are planes at which spatial media content can be generated with the threshold level of quality upon capture (e.g., planes of capture at a distance from the first and second cameras that allows for capture with the threshold level of quality) (e.g., such that content within the first set of one or more planes of capture when the media is captured can be presented with the illusion of a spatial representation of a threshold level of quality). In some embodiments, content that is within the field-of-view of the first camera and/or the field-of-view of the second camera but not within the first set of one or more planes of capture will not be captured with the threshold level of quality. In some embodiments, when detecting the location of the subject, a plurality of objects (e.g., persons and/or inanimate objects) are within the field-of-view of the first camera and/or the second camera, and the plurality of objects includes a first object that is an object of the plurality of objects that is closest in distance to the first set of one or more planes of capture. In some embodiments, the subject (e.g., 1106, X1106, 1108, and/or X1108) is the first object (e.g., the subject is selected based on the object that is closest to the first set of one or more planes of capture). Detecting the location of the subject is the closest object in the current field-of-view to a plane of a combined field-of-view of the cameras provides improved control of spatial media capture, for example, by ensuring that the closest object is a sufficient distance from the cameras. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are missed or mis-captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently.
In some embodiments, aspects/operations of methods 800, 1000, and 1200 may be interchanged, substituted, and/or added between these methods. For example, the user interface displayed in method 1200 can be the same as the user interface displayed in method 800, and the prompt to change the distance between the subject and the plurality of cameras can be displayed according to method 1200 before, during, or after initiating media capture according to method 800. For example, the camera preview displayed in method 1000 can be the same as the capture preview for spatial media displayed in method 1200, and the indicator representing the orientation of the field-of-view of the camera(s) can be displayed according to method 1000 before, concurrently with, or after displaying the prompt to change a distance between the subject and the cameras according to method 1200. For brevity, these details are not repeated here.
In
Schematics (II)-(IV) illustrate translation and rotation movements of first camera 704A and second camera 704B as user 1104 holds device 702 facing, and generally centered on, subject 1106 (e.g., a table with a plant on top, as discussed with respect to FIGS. 11A1-11H). As illustrated in schematic (II) (e.g., a side view of user 1104 holding device 702 facing subject 1106), movements of device 702 along the optical axis of first camera 704A and second camera 704B (e.g., forward and backward with respect to subject 1106) are translation movements along the z-axis (e.g., longitudinal translations), movements of device 702 up and down with respect to subject 1106 are translation movements along the y-axis (e.g., vertical translations), and rotations of device 702 around the x-axis (e.g., to pan up and down with respect to subject 1106) are pitch rotations. As illustrated in schematic (III) (e.g., a top-down view of user 702 holding device 702 facing subject 1106), movements of device 702 forward and backward with respect to subject 1106 are translation movements along the z-axis, movements of device 702 left and right with respect to subject 1106 are translation movements along the x-axis (e.g., transverse or horizontal translations), and rotations of device 702 around the y-axis (e.g., to pan left and right with respect to subject 1106) are yaw rotations. As illustrated in schematic (IV) (e.g., a view from behind user 702 holding device 702 facing subject 1106 (not pictured)), movements of device 702 left and right with respect to subject 1106 are translation movements along the x-axis, movements of device 702 up and down with respect to subject 1106 are translation movements along the y-axis, and rotations of device 702 around the z-axis (e.g., while remaining facing subject 1106) are tilt rotations.
As illustrated in
In some embodiments, the techniques and user interface(s) described in
FIG. 13E2 illustrate how the fields-of-view of one or more cameras of HMD X700 (e.g., such as first camera 704A and second camera 704B), and thus, of camera viewfinder X712, change with respect to the view illustrated in
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
As illustrated in schematics (II)-(IV) of
For example, the overall magnitude of the detected movement is based on the combined (e.g., normalized) magnitude of acceleration of any vertical translation component, any horizontal translation component, any pitch rotation component, and/or any yaw rotation component included in the detected movement and/or the combined (e.g., normalized) magnitude of velocity of any vertical translation component, any horizontal translation component, any pitch rotation component, and/or any yaw rotation component included in the detected movement. For example, the acceleration and/or velocity of the various components may include average values (e.g., for a particular sampling period), maximum values (e.g., for a particular sampling period), and/or instantaneous values. For example, the magnitudes of acceleration and/or velocity of the various components may be normalized to a consistent frame of reference, such as the plane of display 708. For example, the movement criteria may include a combined minimum linear acceleration of 1.5 m/s2, a combined minimum linear velocity of 1 m/s, a combined minimum angular acceleration of 45°/s2, or a combined minimum angular velocity of 25°/s).
As illustrated in schematics (ii)-(iv) of
In some embodiments, computer system 700 updates the appearance of the region of shutter affordance 718 by displaying arc indicator 1312. In some embodiments, as illustrated at the top of
In some embodiments, computer system 700 updates the appearance of the region of shutter affordance 718 by animating video indicator 1304. Computer system 700 animates video indicator 1304 distorting, moving towards, and “splashing” the left side of shutter affordance 718 as shown by the progression of video indicators 1304A-1304E at the bottom of
As illustrated in schematics (II)-(IV) of
As illustrated in schematics (II)-(IV) of
As illustrated in schematics (II)-(IV) of
In
In
In response to detecting movement 1322D, where velocity v falls below the third linear velocity threshold v3 but still exceeds the initial linear velocity threshold v1(e.g., the threshold at which arc indicator 1312 was initially displayed) (e.g., v1<v<v3), computer system 700 ceases displaying text notice 1324, but continues displaying arc indicator 1312 on the right side of shutter affordance 718. In response to detecting movement 1322E, where velocity v falls below the initial linear velocity threshold v1(e.g., the threshold at which the detected movements first met the minimum movement criteria and computer system 700 initially began to update the appearance of the region of shutter affordance 718 as described above) but still exceeds a fourth, lower velocity threshold v4 (e.g., v4<v<v1), computer system 700 updates the appearance of the region of shutter affordance 718 as described above, for example, continuing to display arc indicator 1312 on the right side of shutter affordance 718 and decreasing the opacity and/or the arc length of arc indicator 1312. Finally, in response to detecting movement 1322F, where velocity v falls below the fourth, lower velocity threshold v4, computer system 700 ceases displaying arc indicator 1312 and updating the appearance the region of shutter affordance 718, such that the region of shutter affordance 718 again appears as illustrated in
As illustrated in
As illustrated in
Additional descriptions regarding
The computer system (e.g., 101, 1-100, 1-200, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, 700, X700, and/or 702) captures (1402) video media using the one or more cameras (e.g., 704A and/or 704B) (in some embodiments, while displaying a camera user interface with a camera preview, as described with respect to
The computer system, in response to detecting (1408) the movement of the one or more of cameras and in accordance with a determination that the movement of the one or more cameras meets a set of one or more movement criteria (in some embodiments, exceeding initial velocity and/or acceleration threshold(s) and/or falling below a maximum velocity and/or acceleration threshold(s)), displays (1410), via the display generation component, a movement of (e.g., animating) a visual indicator (e.g., 1304, X1304, and/or 1312) (e.g., as illustrated in
Displaying (1410) the movement of the visual indicator includes, in accordance with a determination that the movement of the one or more cameras is a movement in a first direction of camera movement (in some embodiments, movement of the one or more cameras includes movement (e.g., velocity and/or acceleration) in a respective direction and/or with a respective magnitude; in some embodiments, the first direction is a combined—(e.g., combining the directions of multiple detected movement components, such as linear/cartesian and/or angular/rotational velocities and/or accelerations) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall direction), displaying (1412) the visual indicator (e.g., 1304, X1304, and/or 1312) moving, relative to the displayed reference object (e.g., 718 and/or X718), in a first direction of indicator movement (e.g., animating movement of the visual indicator in the first direction of indicator movement (in some embodiments, as the direction of movement changes, displaying the visual indicator (e.g., the arc) appearing in a first position with respect to the reference object and/or rotating around and/or within the reference object opposite to the direction of movement; in some embodiments, as the direction and/or magnitude of movement changes, displaying the visual indicator (e.g., the square stop icon) moving with simulated (e.g., based on the movement of the one or more cameras) physics (e.g., fluid mechanics) within the non-inertial frame of reference of the reference object (e.g., distorting and “splashing” against the side of the shutter affordance)) and/or ceasing display of the indicator at a first location and displaying the indicator at a second location, wherein the second location is separated from the first location in the first direction of indicator movement), and, in accordance with a determination that the movement of the one or more cameras is a movement in a second direction of camera movement that is different from the first direction of camera movement (in some embodiments, movement of the one or more cameras includes movement (e.g., velocity and/or acceleration) in a respective direction and/or with a respective magnitude; in some embodiments, the second direction is a combined—(e.g., combining the directions of multiple detected movement components, such as linear/cartesian and/or angular/rotational velocities and/or accelerations) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall direction), displaying (1414) the visual indicator moving, relative to the displayed reference object, in a second direction of indicator movement that is different from the first direction of indicator movement (in some embodiments, the first direction of indicator movement is the same as the first direction of camera movement; in some embodiments, the first direction of indicator movement is different from (e.g., opposite to) the first direction of camera movement. in some embodiments, the second direction of indicator movement is the same as the second direction of camera movement; in some embodiments, the second direction of indicator movement is different from (e.g., opposite to) the second direction of camera movement; in some embodiments, the first direction of indicator movement is based on the first direction of camera movement (in some embodiments, the indicator moves in a direction such that it is positioned opposite to and/or in the direction of camera movement (e.g., the visual indicator rotates orthogonally to the direction of camera movement); in some embodiments, the indicator moves according to simulated physics based on the direction of camera movement and/or the displayed reference object (e.g., the visual indicator reacts to the camera movement by moving, splashing, and/or bouncing with simulated physics))). (In some embodiments, as the direction of movement changes differently, displaying the arc appearing in a different position with respect to the reference object and/or rotating differently around and/or within the reference object opposite to the direction of movement; in some embodiments, as the direction and/or magnitude of movement changes differently, displaying the square stop icon moving with simulated (e.g., based on the different movement of the one or more cameras) physics (e.g., fluid mechanics) within the non-inertial frame of reference of the reference object (e.g., distorting and “splashing” against the side of the shutter affordance) (e.g., the movement of the visual indicator is based on the movement of the one or more cameras (in some embodiments, the movement of the visual indicator is based on a change in the movement of the one or more cameras (e.g., acceleration, deceleration, and/or change in direction); in some embodiments, the movement of the visual indicator indicates direction and/or magnitude of the movement of the one or more cameras; in some embodiments, displaying the visual indicator with respective visual characteristics (e.g., size, shape, color, and/or opacity), wherein the respective visual characteristics are based on the movement of the one or more cameras (in some embodiments, distorting the square “stop” icon for a fluid (“splash”) animation; in some embodiments, increasing/decreasing the size and/or opacity of the arc as movement of the cameras increases/decreases))) (in some embodiments, in accordance with a determination that the movement of the one or more cameras does not meet the set of one or more movement criteria, foregoing displaying the movement of the visual indicator relative to the displayed reference object).
Displaying a visual indication moving in a particular direction based on the movement of one or more cameras used for capturing video media provides improved visual feedback about a state of the computer system, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the visual indication intuitively indicates the direction of movement of the one or more cameras to the user, allowing the user to adjust the video capture to avoid visually uncomfortable and/or unwanted movement of the viewpoint.
In some embodiments, displaying the movement of the visual indicator relative to the displayed reference object includes: in accordance with a determination that the movement of the one or more cameras is a movement of a first magnitude (in some embodiments, the movement of the first magnitude includes velocity and/or acceleration of respective magnitudes; in some embodiments, the first magnitude is a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational velocities and/or accelerations) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude), displaying the visual indicator moving a first distance (in some embodiments, a linear distance, such as the distance traveled by the visual indicator (e.g., the square stop icon) as it moves from the center of the reference object towards the edge; in some embodiments, an angular distance, such as the distance traveled by the visual indicator (e.g., the arc) around the displayed reference object) relative to the displayed reference object; and in accordance with a determination that the movement of the one or more cameras is a movement of a second magnitude that is different from the first magnitude (in some embodiments, movement of the second magnitude includes velocity and/or acceleration of respective magnitudes; in some embodiments, the first magnitude is a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational velocity and/or acceleration) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude), displaying the visual indicator moving a second distance that is different from the first distance (in some embodiments, a linear distance, such as the distance traveled by the visual indicator (e.g., the square stop icon) as it moves from the center of the reference object towards the edge; in some embodiments, an angular distance, such as the distance traveled by the visual indicator (e.g., the arc) around the displayed reference object) relative to the displayed reference object (e.g., a magnitude of the movement of the visual indicator relative to the displayed reference object is based on a magnitude of the movement of the one or more cameras; in some embodiments, the first magnitude is larger than the second magnitude, and the first distance is greater than the second distance; in some embodiments, the first magnitude is smaller than the second magnitude, and the first distance is lesser than the second distance (e.g., the visual indicator moves more when the cameras move more)). Displaying a visual indication moving a particular distance based on the movement of one or more cameras used for capturing video media provides improved visual feedback about a state of the computer system, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the visual indication intuitively indicates the magnitude of movement of the one or more cameras to the user, allowing the user to adjust the video capture to avoid visually uncomfortable and/or unwanted movement of the viewpoint.
In some embodiments, the set of one or more movement criteria includes a first criterion that is met when a magnitude (in some embodiments, one or more magnitudes, such as a magnitude of acceleration and/or a magnitude of velocity; in some embodiments, the magnitude is a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational movement) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude) of the movement of the one or more of cameras exceeds a first threshold magnitude (in some embodiments, one or more first threshold magnitudes (e.g., a velocity threshold, an acceleration threshold, a linear threshold and/or a rotational threshold)) (in some embodiments, in response to detecting the movement of the one or more cameras and in accordance with a determination that the magnitude of the movement of the one or more cameras does not exceed the first threshold magnitude, foregoing displaying the movement of the visual indicator). Displaying a visual indication moving when the movement of one or more cameras used for capturing video media exceeds a threshold magnitude provides improved visual feedback about a state of the computer system, assisting the user with composing media capture events, and reducing the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the displayed movement of the visual indication intuitively alerts the user to when the camera movements may be visually uncomfortable and/or excessive, allowing the user to adjust to slow down the camera movements.
In some embodiments, the set of one or more movement criteria includes a second criterion that is met when an amount of change (in some embodiments, one or more amounts of change, such as a change in velocity (e.g., an acceleration) and/or a change in acceleration (e.g., from a previously-detected movement of the one or more cameras); in some embodiments, an amount of change of combined and/or normalized velocity and/or acceleration components) of the movement of the one or more of cameras exceeds a first threshold amount of change (in some embodiments, one or more first threshold amounts (e.g., a threshold change in velocity, a threshold change in acceleration, a threshold change of linear movement and/or a threshold change of rotational movement)) (in some embodiments, in response to detecting the movement of the one or more cameras and in accordance with a determination that the amount of change of the movement of the one or more cameras does not exceed the first threshold amount, foregoing displaying the movement of the visual indicator). Displaying a visual indication moving when the rate of change of the movement of one or more cameras used for capturing video media exceeds a threshold amount of change provides improved visual feedback about a state of the computer system, assisting the user with composing media capture events, and reducing the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the displayed movement of the visual indication intuitively alerts the user to when the camera movements may be visually uncomfortable and/or excessive, allowing the user to adjust to change the movement of the camera more gradually.
In some embodiments, the computer system displays (in some embodiments, while capturing the video media (e.g., the visual indicator (e.g., the square stop icon) is displayed even when movement of the one or more cameras is not detected; in some embodiments, in response to detecting the movement of the one or more cameras (e.g., the visual indicator (e.g., the arc) appears when indicating movement)) the visual indicator within the displayed reference object (e.g., as illustrated in the top and bottom portion of
In some embodiments, the visual indicator is included in a selectable user interface object (e.g., 718 and/or X718) (in some embodiments, the selectable user interface object is also the reference object) displayed via the display generation component (e.g., a stop button/affordance; in some embodiments, the selectable user interface object is displayed while capturing the video media) that, when selected (e.g., as described with respect to
In some embodiments, the computer system, while capturing the video media, detects an air gesture input (e.g., 1328B) (in some embodiments, a pinch air gesture; in some embodiments, another air gesture), and in response to detecting the air gesture input and in accordance with a determination that a gaze of a user of the computer system (e.g., 732) is directed to the selectable user interface object when the air gesture input is detected (e.g., in response to detecting an air gesture selecting the selectable user interface object), ceases capture of the video media (e.g., as illustrated in
In some embodiments, the computer system, while capturing the video media, dets a user input (e.g., 1328A, 1328B, and/or 1328C) selecting (e.g., a user input (e.g., a tap, touch, gesture, and/or click) directed to the selectable user interface object or another user input (e.g., an air gesture, a speech input, hardware button input, or other user input corresponding to a request to stop recording)) the selectable user interface object (e.g., as illustrated in
In some embodiments, displaying the movement of the visual indicator (e.g., 1304, X1304, and/or 1312) includes displaying the visual indicator moving according to simulated (in some embodiments, simulated based on the movement of the one or more cameras and/or the displayed reference object) physics (e.g., as illustrated in
In some embodiments, displaying the movement of the visual indicator includes in accordance with a determination the movement of the one or more cameras is a movement of a first magnitude (in some embodiments, movement of the first type includes movement (e.g., velocity and/or acceleration) with a respective magnitude; in some embodiments, the first magnitude is a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational movement) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude), distorting a spatial property (e.g., size, location, and/or shape) of the visual indicator (e.g., 1304, X1304, and/or 1312) a first amount, and in accordance with a determination the movement of the one or more cameras is a movement of a second magnitude that is different from the first magnitude (in some embodiments, movement of the first type includes movement (e.g., velocity and/or acceleration) with a respective magnitude; in some embodiments, the first magnitude is a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational movement) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude), distorting a spatial property (e.g., size, location, and/or shape) of the visual indicator a second amount that is different from the first amount of distortion of the spatial property of the visual indicator (e.g., the visual indicator is distorted based on a magnitude of the movement of the one or more cameras (in some embodiments, the distortion is according to simulated physics, such as fluid mechanics where the visual indicator is modeled as a liquid droplet or other distortable object); in some embodiments, the first magnitude is larger than the second magnitude, and the first amount is greater than the second amount; in some embodiments, the first magnitude is smaller than the second magnitude, and the first amount is lesser than the second amount (e.g., the more the cameras move, the more the visual indicator distorts)). Distorting the visual indicator based on the magnitude of camera movement provides improved visual feedback about a state of the computer system, assisting the user with composing media capture events, and reducing the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the distortion of the visual indicator allows the user to intuitively determine the magnitude of the movement of the visual indicator and to adjust the movement of the cameras accordingly to avoid visually uncomfortable and/or unwanted camera movement in the captured video.
In some embodiments, the first direction of indicator movement indicates the first direction of camera movement (in some embodiments, the first direction of indicator movement is the same as the first direction of camera movement; in some embodiments, the first direction of indicator movement is opposite the first direction of camera movement; in some embodiments, the first direction of indicator movement is based on the first direction of camera movement), wherein the first direction of camera movement includes (e.g., combines and/or normalizes) a first set of one or more directions corresponding to a plurality of components of the movement of the one or more cameras (in some embodiments, the first direction includes (e.g., combines) the directions of velocity and/or acceleration components; in some embodiments, the first direction includes (e.g., combines) the directions of pitch rotation, yaw rotation, horizontal translation, and/or vertical translation components; in some embodiments, the first direction does not include the directions of tilt rotation and/or longitudinal translation components), and the second direction of indicator movement indicates the second direction of camera movement (in some embodiments, the second direction of indicator movement is the same as the second direction of camera movement; in some embodiments, the second direction of indicator movement is opposite the second direction of camera movement; in some embodiments, the second direction of indicator movement is based on the second direction of camera movement), wherein the second direction of camera movement includes (e.g., combines and/or normalizes) a second set of one or more directions corresponding to the plurality of components of the movement of the one or more cameras (in some embodiments, the second direction includes (e.g., combines) the directions of velocity and/or acceleration components; in some embodiments, the second direction includes (e.g., combines) the directions of pitch rotation, yaw rotation, horizontal translation, and/or vertical translation components; in some embodiments, the second direction does not include the directions of tilt rotation and/or longitudinal translation components). Displaying a visual indication moving in one direction based on the directions of multiple components of the movement of the one or more cameras provides improved visual feedback about a state of the computer system without cluttering the user interface, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the user can simultaneously monitor movement in multiple dimensions and/or around multiple axes and adjust accordingly.
In some embodiments, the plurality of components of the movement of the one or more cameras (e.g., the components represented in the first and/or second direction of camera movement) includes one or more translation (e.g., cartesian velocity and/or acceleration) components (e.g., 1310B, 1314A, 1316A, 1316B, 1320A, 1320B, and/or 1322A-1322F) (in some embodiments, arising from horizontal and/or vertical translation movements of the one or more cameras). In some embodiments, the plurality of components of the movement of the one or more cameras (e.g., the components represented in the first and/or second direction of camera movement) includes one or more rotation (e.g., angular velocity and/or acceleration) components (e.g., 1310A, 1314B, 1318A, and/or 1318B) (in some embodiments, arising from pitch and/or yaw rotation movements of the one or more cameras). In some embodiments, one or more components of the movement of the one or more cameras (e.g., 1326A and/or 1326B) are not included in the plurality of components of the movement of the one or more cameras (e.g., as illustrated in
In some embodiments, capturing the video media using the one or more cameras includes generating a first video component corresponding to a viewpoint of a right eye (e.g., using a first camera) and generating a second video component that is different from the first video component corresponding to a viewpoint of a left eye (e.g., using a second camera different from the first camera), wherein concurrently viewing the first video component and the second video component creates component creates an illusion of a three-dimensional representation of the video media (e.g., viewing different images with the left and right eye creates the illusion of depth by simulating the parallax effect of binocular vision) (e.g., capturing the video media includes capturing spatial video media). Displaying a visual indication moving in a particular direction based on the movement of one or more cameras used for capturing spatial video media provides improved visual feedback about a state of the computer system, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, the movement of the visual indication alerts the user to movements that may adversely impact the quality of spatial media capture, which may be relatively small compared to movements that impact the quality of non-spatial media.
In some embodiments, the computer system, prior to capturing the video media, displays a level indicator (e.g., 920 and/or X920), wherein the level indicator indicates an orientation (e.g., a tilt orientation) of the one or more cameras relative to a respective (e.g., target) orientation (e.g., as illustrated in
In some embodiments, the computer system, while displaying the visual indicator (e.g., 1304, X1304, and/or 1312) and in response to detecting the movement of the one or more cameras and in accordance with a determination that a magnitude of the movement of the one or more cameras (in some embodiments, a combined (e.g., combining the magnitudes of multiple detected movement components, such as linear/cartesian and/or angular/rotational velocities and/or accelerations) and/or normalized (e.g., with respect to a particular frame of reference, such as the plane of the display) overall magnitude) exceeds a notification threshold (in some embodiments, a set of one or more magnitude thresholds (e.g., for velocity and/or acceleration, and/or for linear and/or rotational movement), in some embodiments, the notification threshold is a higher magnitude than the magnitude threshold at which movement of the visual indicator is initially displayed), displays a text notice (e.g., 1324) (e.g., as illustrated in
In some embodiments, the computer system, while displaying the text notice (e.g., 1324), detects, via the one or more sensors, a second movement of the one or more cameras (e.g., 1322C-1322F), and in response to detecting the second movement of the one or more cameras and in accordance with a determination that a magnitude of the second movement of the one or more cameras does not exceed a notice-maintenance threshold (in some embodiments, a set of one or more magnitude thresholds (e.g., for velocity and/or acceleration, and/or for linear and/or rotational movement); in some embodiments, the notice-maintenance threshold is the same as the notification threshold; in some embodiments, the notice-maintenance threshold is different than the notification threshold), ceases displaying the text notice (e.g., as illustrated in
In some embodiments, the notice-maintenance threshold is a lower magnitude threshold than the notification threshold (e.g., as illustrated in
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, and 1600 may be interchanged, substituted, and/or added between these methods. For example, the user interfaces displayed in methods 800, 1000, and/or 1200 can be used to control media capture and/or to provide feedback on capture orientation and capture distance before, during, and/or after the video media capture performed in method 1400. For example, the video captured according to method 1400 can be played back according to method 1600. For brevity, these details are not repeated here.
In some embodiments, video media item 1502 may be a photo media capture of limited duration that includes content from before and/or after the capture input is detected (e.g., before and/or after an air pinch gesture is released, an air tap gesture is detected, a button press is detected or released), such as a brief animated photo where several frames are captured when a photo is taken, creating a “live” effect). In some embodiments, each of the several frames captured when the brief animated photo is taken (e.g., before and/or after the input requesting capture of the photo was detected) includes stereoscopic depth information, for example, a first frame component for the viewer's right eye and a second frame component for the viewer's left eye. Like other video media, a brief animated photo can be played back (e.g., as a brief animation, a loop, and/or a “bouncing” or “reversing” loop) or viewed as a still preview (e.g., including the first frame component and the second frame component for a single key frame).
As illustrated in
Movement profile 1500 further includes category profile 1500C. As illustrated in
As illustrated in
At
In some embodiments, computer system 700 changes the border and framing settings gradually. For example, when transitioning from playback of video segment 1502a to video segment 1502B (e.g., during a 0.1, 0.5, and/or 1 second transition period overlapping with the end of window t1 and/or the beginning of window t2), computer system 700 may crop video media item 1502 progressively smaller until reaching the second size and/or display border effect 1508 gradually expanding up to the second width (in some embodiments, with a progressively larger blur radius up to the blur radius for movement level 2).
At
At FIG. 15F1, computer system 700 continues playback of video media item 1502 with video segment 1502D. As discussed with respect to
In some embodiments, the techniques and user interface(s) described in FIG. 15F1 are provided by one or more of the devices described in
At FIG. 15F2, HMD X700 provides playback of video media item X1502 at with video segment X1502D. As discussed with respect to
Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in
At
At
At
At
In some embodiments, as illustrated in
In some embodiments, as illustrated in
Additional descriptions regarding
The computer system, while playback of a video media item (e.g., 1502 and/or X1502) (in some embodiments, a spatial media video including a first video component corresponding to a viewpoint of a right eye and a second video component, different from the first video component, corresponding to a viewpoint of a left eye such that concurrently viewing the first video component and the second video component creates an illusion of a three-dimensional representation of the video media (e.g., viewing different images with the left and right eye creates the illusion of depth by simulating parallax of the image contents)) is ongoing (1602), wherein playback of the video media item includes displaying the video media item concurrently with a border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) that is outside of the video media item, changes (1604) a visual prominence (in some embodiments, a size of the crop of the video media item; in some embodiments, a width of the border region; in some embodiments, a blur radius of the border region; in some embodiments, an opacity of the border region; in some embodiments, making the change at the start of playback, e.g., before the video media item itself is output (in some embodiments, but after playback has been requested by a user), such that the video media item is initially displayed with the changed visual prominence; in some embodiments, while the video media item itself is being output (e.g., making “live” changes to the visual prominence during playback)) of the video media item relative to the border region based on a representation(e.g., 1500) (in some embodiments, the representation of the movement includes a category or level corresponding to the amount of movement (e.g., the category or level corresponding to a relative or absolute range of movement); in some embodiments, the representation of the movement includes and/or is based on one or more magnitudes (e.g., a magnitude of velocity and/or a magnitude of acceleration) and/or one or more directions (e.g., a magnitude of velocity and/or a magnitude of acceleration) (in some embodiments, the movement is represented as a vector); in some embodiments, the representation of the movement includes and/or is based on one or more movement components (e.g., linear and/or angular velocity and/or acceleration can be normalized and combined to determine combined/net magnitude(s) and/or direction(s)) of movement (e.g., a translation (e.g., x, y, and/or z cartesian movement) and/or a rotation (e.g., a movement around an axis (e.g., yaw, pitch, and/or roll)); in some embodiments, the movement includes (e.g., is based on) velocity and/or acceleration (e.g., instantaneous, average, and/or maximum velocity and/or acceleration for one or more sampling periods)) of a viewpoint (e.g., a detected viewpoint from which the video media was captured or an estimated viewpoint from which the video media was captured) corresponding to the video media item that occurred while the video media item was being captured (e.g., detected or estimated movement of the detected or estimated viewpoint) (e.g., a detected (e.g., by one or more motion sensors while capturing the video media item) and/or perceived (e.g., apparent) camera movement (in some embodiments, the perceived camera movement is a movement of a virtual camera (e.g., a “camera” capturing in and “moving” around a virtual environment); in some embodiments, the perceived camera movement is determined based on the visual content of the video media item (e.g., using image processing techniques (e.g., estimating the camera movement based on, e.g., motion blur, visual distortion, estimated dimensions of visual content, and/or camera metadata)); in some embodiments, the perceived camera movement is based on one or more characteristics of playback)).
Changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a first amount of movement (e.g., a magnitude of movement, a rate of change of the movement, a direction of the movement, and/or a change in direction of the movement; in some embodiments, the movement of the viewpoint corresponding to the video media item is obtained and/or determined from the representation of the movement of the viewpoint corresponding to the video media; in some embodiments, the movement corresponds to a first amount of movement if a magnitude of the movement falls within a first range (in some embodiments, the first range is one of a plurality of ranges representing different “levels” of movement; in some embodiments, the plurality of ranges are predetermined ranges; in some embodiments, the plurality of ranges are determined at least in part based on the overall movement range of the video media item); in some embodiments, the movement corresponds to a first amount of movement if it is preceded by and/or followed by a different movement of the viewpoint of the video media item, wherein the magnitude of the different movement falls within a respective range (e.g., the movement can be characterized with respect to other movements of the viewpoint of the video media item)), changing the visual prominence of the video media item relative to the border region to a first level of relative visual prominence (e.g., a first crop size, border width, border blur radius, and/or border opacity; in some embodiments, a first state of a plurality of states (e.g., corresponding to the plurality of movement ranges)), and in accordance with a determination that the movement of the viewpoint corresponding to the video media item corresponds to a second amount of movement different from the first amount of movement (in some embodiments, the movement corresponds to a second amount of movement if a magnitude of the movement falls within a second, different range; in some embodiments, the movement corresponds to a second amount of movement if it is preceded by and/or followed by a different movement of the viewpoint of the video media item, wherein the magnitude of the different movement falls within a respective range (e.g., the movement can be characterized with respect to other movements of the viewpoint of the video media item)), changing the visual prominence of the video media item relative to the border region to a second level of relative visual prominence that is different from the first level of relative visual prominence (e.g., a second crop size, border width, border blur radius, and/or border opacity; in some embodiments, a second state of a plurality of states (e.g., corresponding to the plurality of movement ranges)) (e.g., as illustrated in
In some embodiments, the first amount of movement is a larger amount of movement than the second amount of movement (e.g., the first amount of movement represents a greater overall magnitude of and/or a greater overall rate of change in the apparent (e.g., detected and/or estimated) movement (e.g., velocity and/or acceleration of one or more movement components) of the viewpoint of the video media than the second amount of movement), and the video media is displayed less prominently (in some embodiments, cropped to a smaller size; in some embodiments, with a wider border; in some embodiments, with a higher blur radius applied to the border region) relative to the border region at the first level of visual prominence than at the second level of visual prominence. Automatically decreasing the visual prominence of the video media relative to the border region in response to larger apparent movements of the viewpoint of the video media and increasing the visual prominence of the video media relative to the border region in response to smaller apparent movements of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, displaying the video media with relatively less visual prominence in response to more intense apparent camera movement reduces viewing discomfort due to the more intense camera movement, while displaying the video media with relatively more visual prominence in response to less intense apparent camera movement enhances the playback of the video media when the apparent camera movement is less likely to cause viewing discomfort.
In some embodiments, the border region includes a passthrough region (e.g., 1506A, 1506B, 1506C, and/or X1506B) that includes a representation of a physical environment of a user (in some embodiments, passthrough video of the physical environment; in some embodiments, optical passthrough (e.g., via transparent or semi-transparent regions of a display); in some embodiments, the passthrough representation is visible at some levels of visual prominence and not visible at others (e.g., the video can be full-screen). Displaying the video media with a border region that includes environmental passthrough content (e.g., at some levels of reduced visual prominence of the video media) provides improved ergonomics of media playback. For example, the representation of the physical environment orients the user within the physical environment while viewing the video media.
In some embodiments, the computer system detects a user input (e.g., a tap, touch, click, gesture, air gesture, speech input, and/or hardware button input) requesting playback of the video media item and in response to detecting the user input, initiates playback of the video media item.
In some embodiments, the video media item is stored in association with (e.g., includes and/or points to metadata augmenting the image and audio data of the video media item) the representation (e.g., 1500 and/or X1500) of the movement of the viewpoint corresponding to the video media (e.g., changing the visual prominence of the video media item relative to the border region based on metadata including and/or based on the velocity and/or acceleration of perceived camera movements in the video media; in some embodiments, the information is determined by the computer system (in some embodiments, the computer system analyzes the video media item (in some embodiments, metadata of the video media item; in some embodiments, the video (e.g., image and audio) data itself) to determine the movement of the viewpoint; in some embodiments, the computer system was used to capture the video); in some embodiments, the information is received by the computer system along with the video media item). Storing the video media item along with movement information provides improved control of media playback and improved ergonomics of media playback devices. For example, the movement information stored in association with the video media item allows the system to quickly and efficiently adjust the visual prominence of media playback.
In some embodiments, the representation (e.g., 1500 and/or X1500) of the movement of the viewpoint corresponding to the video media includes movement information (e.g., changing the visual prominence of the video media item relative to the border region based on velocity and/or acceleration data corresponding to movement of the viewpoint corresponding to the video media) captured (in some embodiments, detected using one or more motion sensors of the camera system; in some embodiments, recorded information on camera movement (e.g., for a virtual camera or electronically-controlled camera movements)) when the video media item was captured (e.g., concurrently with and/or in association with recording (e.g., filming, rendering, editing, and/or compiling) the video media item). Using movement information captured concurrently with the video media item provides improved control of media playback and improved ergonomics of media playback devices. For example, movement information captured with the video media item allows the system to quickly and efficiently adjust the visual prominence of media playback based on actual camera movements.
In some embodiments, the representation (e.g., 1500 and/or X1500) of the movement of the viewpoint corresponding to the video media includes (e.g., changing the visual prominence of the video media item relative to the border region based on) movement information determined (e.g., using video processing techniques (e.g., estimating the camera movement based on, e.g., motion blur (e.g., determining a greater amount of movement when more motion blur is present than when less motion blur is present), visual distortion (e.g., determining a greater amount of movement when subject matter detected in the video grows, translates, or distorts at a faster rate), estimated dimensions of visual content (e.g., estimating position, velocity, and/or acceleration based on, e.g., an estimation of how long it would take to pan over, zoom into, and/or rotate around visual content of particular dimensions), and/or other video metadata)) after the video media item was captured (e.g., based on the video media item (e.g., based on the image and/or audio data)). Using movement information determined after the video media item provides improved control of media playback and improved ergonomics of media playback devices. For example, movement information calculated or derived from the video media item allows the system to quickly and efficiently adjust the visual prominence of media playback based on the actual contents of the video media item.
In some embodiments, changing the visual prominence of the video media item (e.g., 1502 and/or X1502) relative to the border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) based on the representation of the movement of the viewpoint corresponding to the video media item includes: in accordance with a determination that the movement of the viewpoint corresponding to the video item corresponds to the first amount of movement, changing the visual prominence of the video media item relative to the border region by a first amount of change (e.g., increasing or decreasing the crop size, border width, border blur radius, and/or border opacity by a first amount), and in accordance with a determination that the movement of the viewpoint corresponding to the video item corresponds to the second amount of movement, changing the visual prominence of the video media item relative to the border region by a second amount of change that is different from the first amount of change (e.g., increasing or decreasing the crop size, border width, border blur radius, and/or border opacity by a second amount). Changing the visual prominence of the video media relative to the border region based on apparent movement of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, automatically adjusting the visual prominence of the video media provides a more physically comfortable viewing experience without needing to display controls for playback settings or requiring the user to manually input adjustments before and during playback.
In some embodiments, the first amount of movement is a larger amount of movement than the second amount of movement (e.g., the first amount of movement represents a greater overall magnitude of the apparent (e.g., detected and/or estimated) movement (e.g., velocity and/or acceleration of one or more movement components) of the viewpoint of the video media than the second amount of movement), changing the visual prominence of the video media (e.g., 1502 and/or X1502) item relative to the border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) to the first level of visual prominence includes displaying the border region occupying a first area (e.g., a particular border width, border area, and/or border dimensions; in some embodiments, cropping the video media such that the border region outside of the video media is a first size), and changing the visual prominence of the video media item relative to the border region to the second level of visual prominence includes displaying the border region occupying a second area that is smaller than the first area (in some embodiments, cropping the video media such that the border region outside of the video media is a second size smaller than the first size) (e.g., the visual prominence of the video media item relative to the border region is higher when the border region is smaller and lower when the border region is larger, so visual prominence of the media item relative to the border region is decreased for larger amounts of movement). Automatically increasing the size of the border region in response to larger apparent movements of the viewpoint of the video media and decreasing the size of the border region in response to smaller apparent movements of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, increasing the size of the border in response to more intense apparent camera movement reduces viewing discomfort due to the more intense camera movement, while decreasing the size of the border in response to less intense apparent camera movement improves visibility of and increases immersion in the video media.
In some embodiments, the first amount of movement is a smaller amount of movement than the second amount of movement (e.g., the first amount of movement represents a smaller overall magnitude of the apparent (e.g., detected and/or estimated) movement (e.g., velocity and/or acceleration of one or more movement components) of the viewpoint of the video media than the second amount of movement), changing the visual prominence of the video media item (e.g., 1502 and/or X1502) relative to the border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) to the first level of visual prominence includes displaying the border region occupying a third area (e.g., a particular border width, border area, and/or border dimensions; in some embodiments, cropping the video media such that the border region outside of the video media is a third size); and changing the visual prominence of the video media item relative to the border region to the second level of visual prominence includes displaying the border region occupying a fourth area that is larger than the third area (in some embodiments, cropping the video media such that the border region outside of the video media is a fourth size larger than the third size) (e.g., the visual prominence of the video media item relative to the border region is higher when the border region is smaller and lower when the border region is larger, so visual prominence of the media item relative to the border region is increased for smaller amounts of movement). Automatically decreasing the size of the border region in response to smaller apparent movements of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, increasing the size of the border in response to more intense apparent camera movement reduces viewing discomfort due to the more intense camera movement, while decreasing the size of the border in response to less intense apparent camera movement improves visibility of and increases immersion in the video media.
In some embodiments, the computer system, while playback of the video media item is ongoing, changing a visual characteristic (e.g., a blur width, a blur radius, and/or an opacity) of the border region over a period of time as the video plays (e.g., as illustrated in
In some embodiments, changing the visual prominence of the video media item relative to the border region to the first level of relative visual prominence includes displaying the video media item at a first visual scale (e.g., a standard, default, and/or full-screen scale) and occupying a first area (e.g., cropping the video media item to a first size without changing the scaling of the video image contents), and changing the visual prominence of the video media item relative to the border region to the first level of relative visual prominence includes displaying the video media item at the first visual scale and occupying a second area that is a different size than the first area (e.g., as illustrated in
In some embodiments, changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes, in accordance with a determination that a movement of a viewpoint corresponding to an upcoming segment of the video media item (e.g., upcoming/future apparent camera movement in a segment of the video media item that is yet to be played back/is later than the current playback position) corresponds to a third amount of movement (in some embodiments, if a magnitude of the upcoming movement falls within a particular range; in some embodiments, if the magnitude of the upcoming movement represents an increase of at least a particular amount from the magnitude of current and/or past movement), changing the visual prominence of the video media item relative to the border region to a third level of relative visual prominence during playback of a portion of the video media item that is before the upcoming segment of the video media item (e.g., as illustrated in
In some embodiments, the determination that the movement of the viewpoint corresponding to the video media item corresponds to the first amount of movement includes a determination that a magnitude (e.g., a magnitude of velocity and/or a magnitude of acceleration of one or more movement components (e.g., a translation (e.g., x, y, and/or z cartesian movement) and/or a rotation (e.g., a movement around an axis (e.g., yaw, pitch, and/or roll)); in some embodiments, the magnitudes of the one or more movement components are normalized and/or combined; in some embodiments, the magnitudes of velocity and acceleration are normalized and/or combined; in some embodiments, instantaneous, average, and/or maximum magnitude for one or more sampling periods) of the movement of the viewpoint corresponding to the video media item corresponds to a first magnitude of movement (in some embodiments, a first range of magnitudes, such as apparent velocity of greater than 10 m/s and/or 25°/s and/or apparent acceleration of greater than 25 m/s2 and/or 35°/s2), and the determination that the movement of the viewpoint corresponding to the video media item corresponds to the second amount of movement includes a determination that the magnitude of the movement of the viewpoint corresponding to the video media item corresponds to a second magnitude of movement different than the first magnitude of movement (in some embodiments, a second range of magnitudes, such as apparent velocity of less than 10 m/s and/or 25°/s and/or apparent acceleration of less than 25 m/s2 and/or 35°/s2) (e.g., the visual prominence of the video media item is changed (e.g., to the first level or the second level) based on the amount of movement). Changing the visual prominence of the video media relative to the border region based on the magnitude of apparent movement of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, displaying the video media with relatively less visual prominence in response to more intense apparent camera movement reduces viewing discomfort due to the more intense camera movement, while displaying the video media with relatively more visual prominence in response to less intense apparent camera movement enhances the playback of the video media when the apparent camera movement is less likely to cause viewing discomfort.
In some embodiments, the determination that the movement of the viewpoint corresponding to the video media item corresponds to the first amount of movement includes a determination that a rate of change in a magnitude (e.g., a rate of change of velocity (e.g., acceleration) and/or a rate of change of acceleration of one or more movement components (e.g., a translation (e.g., x, y, and/or z cartesian movement) and/or a rotation (e.g., a movement around an axis (e.g., yaw, pitch, and/or roll)); in some embodiments, instantaneous, average, and/or maximum rates of change for one or more sampling periods) of the movement of the viewpoint corresponding to the video media item corresponds to a first rate of change (in some embodiments, a first range of rates of change, such as apparent acceleration of greater than 25 m/s2 and/or 35°/s2 and/or a change in acceleration of greater than 100 m/s2 and/or 90°/s2 during a 1-second video segment), and the determination that the movement of the viewpoint corresponding to the video media item corresponds to the second amount of movement includes a determination that the rate of change in a magnitude of the movement of the viewpoint corresponding to the video media item corresponds to a second rate of change different than the first rate of change (e.g., the visual prominence of the video media item is changed (e.g., to the first level or the second level) based on the amount of movement) (in some embodiments, a second range of rates of change, such as apparent acceleration of less than 25 m/s2 and/or 35°/s2 and/or a change in acceleration of less than 100 m/s2 and/or 90°/s2 during a 1-second video segment). Changing the visual prominence of the video media relative to the border region based on the magnitude of apparent movement of the viewpoint of the video media provides improved control of media playback and improved ergonomics of media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, displaying the video media with relatively less visual prominence in response to rapidly changing apparent camera movement reduces viewing discomfort due to the more intense camera movement, while displaying the video media with relatively more visual prominence in response to relatively stable/unchanging apparent camera movement enhances the playback of the video media when the apparent camera movement is less likely to cause viewing discomfort.
In some embodiments, changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes changing a size of a respective area occupied by the video media item (e.g., cropping the video media item to a particular size based on the representation of the movement of the viewpoint corresponding to the video media item; in some embodiments, without changing the scaling of the video image contents; in some embodiments, cropping the video media item to a smaller size (e.g., thereby reducing the relative visual prominence) in response to more movement and cropping the video media item to a larger size (e.g., thereby increasing the relative visual prominence) in response to less movement). Changing the visual prominence of the video media item relative to the border region by changing the size of the video media item provides improved ergonomics of media playback devices. For example, cropping the video media item can improve viewing comfort by reducing the proportion of the viewer's field-of-view occupied by the video media item and/or increasing the visibility of an environment to ground or orient the viewer.
In some embodiments, displaying the video media item concurrently with a border region that is outside of the video media item includes applying a blurring effect to a respective display region (e.g., 1508) (e.g., blurring at least a portion of the video media item and/or the border region; in some embodiments, the portion is a border area (e.g., an area where the video media item meets the border region; e.g., the blurring effect creates a soft/feathered edge to the video media item)), and changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes changing a blur radius of the blurring effect (e.g., changing an intensity/extent of blurring based on the representation of the movement of the viewpoint corresponding to the video media item; in some embodiments, blurring the edges more (e.g., thereby reducing the relative visual prominence) in response to more movement and blurring the edges less (e.g., thereby increasing the relative visual prominence) in response to less movement; in some embodiments, blurring the edges less in response to more movement (e.g., to improve visibility of video media cropped to a smaller size)). Changing a blur radius of a blurring effect while changing the visual prominence of the video media item relative to the border region provides improved ergonomics of media playback devices. For example, the blurring effect can affect the visual prominence of the video media item and/or the display region and/or improve the appearance of the video media item and/or the display region as changes are made to other visual characteristics (e.g., the relative sizing).
In some embodiments, displaying the video media item concurrently with a border region that is outside of the video media item includes applying a feathering effect to a respective display region (e.g., 1508 and/or X1508) (e.g., blurring at least a portion of the video media item and/or the border region; in some embodiments, the portion is a border area (e.g., an area where the video media item meets the border region; e.g., the blurring effect creates a soft/feathered edge to the video media item)), and changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes changing a feather radius of the feathering effect (e.g., changing an intensity/extent of feathering based on the representation of the movement of the viewpoint corresponding to the video media item; in some embodiments, feathering the edges more (e.g., thereby reducing the relative visual prominence) in response to more movement and feathering the edges less (e.g., thereby increasing the relative visual prominence) in response to less movement; in some embodiments, feathering the edges less in response to more movement (e.g., to improve visibility of video media cropped to a smaller size)). Changing a feathering radius of a feathering effect while changing the visual prominence of the video media item relative to the border region provides improved ergonomics of media playback devices. For example, the feathering effect can affect the visual prominence of the video media item and/or the display region and/or improve the appearance of the video media item and/or the display region as changes are made to other visual characteristics (e.g., the relative sizing).
In some embodiments, changing the visual prominence of the video media item relative to the border region based on the representation of the movement of the viewpoint corresponding to the video media item includes changing a visibility of a representation (e.g., 1506A, 1506B, 1506C, and/or X1506B) (e.g., displayed image and/or video content, passthrough video, and/or optical passthrough (e.g., via transparent or semi-transparent regions of a display)) of an XR environment (e.g., 1506) (in some embodiments, the XR environment includes a physical environment; in some embodiments, the XR environment includes an environment-locked virtual environment; in some embodiments, the XR environment includes other virtual content (e.g., non-video media UI elements, application content, or other displayed content)) included in the border region (in some embodiments, changing the visibility of the representation of the XR environment includes increasing or decreasing the size of the border region to respectively reveal more or less of the XR environment; in some embodiments, changing the visibility of the representation of the XR environment includes changing a blurring effect applied to at least a portion of the representation; in some embodiments, changing the visibility of the representation of the XR environment includes changing another visual characteristic of the representation). Changing the visual prominence of the video media item relative to the border region by changing the visibility of surrounding (e.g., non-video) content provides improved ergonomics of media playback devices. For example, increasing the visibility of surrounding content can improve viewing comfort by reducing the proportion of the viewer's field-of-view occupied by grounding and/or orienting the viewer outside of the frame of reference of the video media item, while decreasing the visibility of surrounding content can enhance playback of video media by increasing the immersive effect of the video media.
In some embodiments, the representation of the XR environment includes a representation (e.g., passthrough video and/or optical passthrough (e.g., via transparent or semi-transparent regions of a display) of a physical environment (e.g., the user's physical surroundings). Changing the visual prominence of the video media item relative to the border region by changing the visibility of surrounding (e.g., non-video) content provides improved ergonomics of media playback devices. For example, increasing the visibility of surrounding content can improve viewing comfort by reducing the proportion of the viewer's field-of-view occupied by grounding and/or orienting the viewer in physical space. In some embodiments, the representation of the XR environment includes a representation (e.g., displayed image and/or video content) of a virtual environment (in some embodiments, an environment-locked virtual environment). Changing the visual prominence of the video media item relative to the border region by changing the visibility of surrounding (e.g., non-video) content provides improved ergonomics of media playback devices. For example, increasing the visibility of surrounding content can improve viewing comfort by reducing the proportion of the viewer's field-of-view occupied by grounding and/or orienting the viewer outside the frame of reference of the video media item.
In some embodiments, changing a visibility of the representation of the XR environment includes changing a darkness level of the border region (e.g., as illustrated in
In some embodiments, the video media item includes a first video component corresponding to a viewpoint of a right eye and a second video component that is different from the first video component corresponding to a viewpoint of a left eye, wherein concurrently viewing the first video component and the second video component creates component creates an illusion of a three-dimensional representation of the video media item (e.g., viewing different images with the left and right eye creates the illusion of depth by simulating the parallax effect of binocular vision) (e.g., the video media includes spatial video media). Changing the visual prominence of spatial video media relative to the border region based on apparent movement of the viewpoint of the spatial video media provides improved control of spatial media playback and improved ergonomics of spatial media playback devices without cluttering the user interface with additional displayed controls or requiring additional user inputs for adjusting visual prominence which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, as movements in the viewpoint of spatial video media can greatly impact viewing comfort, automatically adjusting the visual prominence of the spatial video media provides a more physically comfortable viewing experience without needing to display controls for playback settings or requiring the user to manually input adjustments before and during playback.
In some embodiments, the computer system, while playback of a second video media item is ongoing, wherein the second video media item does not include two or more video components that, when viewed concurrently, create an illusion of a three-dimensional representation of the second video media item (e.g., while playing non-spatial video media), foregoes changing the visual prominence of the second video media item relative to the border region (e.g., non-spatial video media content is played at a consistent level of visual prominence relative to the border region (e.g., even when the non-spatial video media content includes a significant amount of apparent camera movement; in some embodiments, non-spatial video media content does not include a representation of the movement of the viewpoint corresponding to the video media item, and thus, changes are not made to the visual prominence of the non-spatial video media content based on a representation of movement). Conditionally changing visual prominence of spatial video media relative to the border region based on apparent movement of the viewpoint of the spatial video media provides improved control of spatial media playback without cluttering the user interface with additional displayed controls, which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, dynamic changes to the visual prominence of media playback are automatically enabled or disabled based on the type of the video, without the user needing to remember to change the setting or to provide additional inputs.
In some embodiments, playback of the video media item includes displaying the video media item as a virtual object (in some embodiments, a viewpoint-locked virtual object; in some embodiments, an environment-locked virtual object) in an XR environment (e.g., as illustrated in
In some embodiments, changing the visual prominence of the video media item (e.g., 1502 and/or X1502) relative to the border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) based on the representation (e.g., 1500 and/or X1500) of the movement of the viewpoint corresponding to the video media item (e.g., as illustrated in
In some embodiments, changing the visual prominence of the video media item (e.g., 1502 and/or X1502) relative to the border region (e.g., 1506A-1506C, X1506B, 1508, and/or X1508) based on the representation (e.g., 1500 and/or X1500) of the movement of the viewpoint corresponding to the video media item (e.g., as illustrated in
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, and 1600 may be interchanged, substituted, and/or added between these methods. For example, the video media being played back in method 1600 may be video media captured using the user interfaces and indicators described with respect to methods 800, 1000, 1200, and/or 1400. For brevity, these details are not repeated here.
As illustrated in
Upon initiating the capture of the spatial video media at
HMD X700 initially displays stability indicator 1704 at a location (e.g., of display module X702) representing anchor location 1706 in the environment. For example, HMD X700 displays stability indicator 1704 overlaying (e.g., via passthrough video and/or optical passthrough) anchor location 1706 in the environment, and/or renders stability indicator 1704 as a virtual object at a virtual location (e.g., of a three-dimensional XR environment) that corresponds to anchor location 1706. In some embodiments, HMD700 displays stability indicator 1704 to appear at a particular depth in front of a user (e.g., by displaying stability indicator 1704 differently for the user's left and right eyes). For example, for comfortable viewing of stability indicator 1704, HMD700 may display stability indicator 1704 and/or other elements of media capture interface X710 in a plane a predetermined depth (e.g., 1 meter) from the user's eyes or in a plane at a depth that dynamically updates based on the convergence point of the user's eyes (e.g., detected using one or more sensors, such as X704). Although stability indicator 1704 is displayed along with the environment being captured in the ongoing spatial video capture, stability indicator 1704 (e.g., like other elements of media capture interface X710) does not appear in the captured media itself.
Anchor location 1706 serves as a reference or target point for HMD X700 to define “stable” or “low-motion” spatial video capture, e.g., video capture with only minimal yaw rotation, pitch rotation, vertical translation, and horizontal translation movement of the viewpoint (e.g., actual and/or apparent camera motion). In some embodiments, HMD X700 can determine how the viewpoint of the spatial video capture changes with respect to the initial viewpoint of the environment seen in
At
In response to detecting movement 1710, HMD X700 changes the appearance of stability indicator 1704 to indicate anchor location 1706 and the initial viewpoint of the spatial media capture (e.g., the viewpoint illustrated in
In some embodiments, the location where stability indicator 1704 is displayed is determined based on one or more simulated physical properties of stability indicator 1704, anchor location 1706, viewpoint location 1708, and/or the XR environment. In some embodiments, simulating the one or more simulated physical properties includes simulating stability indicator 1704 pulling away from anchor location 1706 with some inertia, momentum, resistance, and/or friction. In some embodiments, simulating the one or more simulated physical properties includes modeling stability indicator 1704 as a virtual object with one or more simulated forces pulling stability indicator 1704 towards anchor location 1706 and/or one or more simulated forces pulling alignment indicator 1714 towards viewpoint location 1708, such as gravitational forces, magnetic forces, electrostatic forces, and/or spring forces. In some embodiments, in response to further changes in the current viewpoint that reverse the previous changes (e.g., viewpoint movements that move the spatial video capture closer to alignment with the initial viewpoint), HMD X700 also reverses some of all of the previous changes to stability indicator 1704 (e.g., moving stability indicator 1704 back into alignment with anchor location 1706).
At
As illustrated in
At
In some embodiments, the location where alignment indicator 1714 is displayed is determined based on one or more simulated physical properties of alignment indicator 1714, anchor location 1706, viewpoint location 1708, and/or the XR environment. In some embodiments, simulating the one or more simulated physical properties includes simulating alignment indicator 1714 following the movement of viewpoint location 1708 with some inertia, momentum, resistance and/or friction. In some embodiments, simulating the one or more simulated physical properties includes modeling alignment indicator 1714 as a virtual object with one or more simulated forces pulling alignment indicator 1714 towards viewpoint location 1708 and one or more simulated forces pulling alignment indicator 1714 towards anchor location 1706 and displaying alignment indicator 1714 at the equilibrium point. In some embodiments, the simulated forces may include displacement-dependent forces, such as gravitational forces, magnetic forces, electrostatic forces, and/or spring forces. For example, simulating the one or more simulated physical properties may include modeling a spring force between viewpoint location 1708 and alignment indicator 1714 and a gravitational force between anchor location 1706 and alignment indicator 1714. Thus, as viewpoint location 1708 moves further away from anchor location 1706, the spring force on alignment indicator 1714 (e.g., at its initial/previous location) increases, pulling alignment indicator 1714 closer to viewpoint location 1708 until the decreasing spring force equalizes with the decreasing gravitational force pulling alignment indicator 1714 in the direction of anchor location 1706. In some embodiments, HMD X700 may simulate the forces acting on alignment indicator 1714 in other ways, for instance, modeling simulated mass or simulated spring constants as functions of displacement (e.g., increasing the “mass” of viewpoint location 1708 and decreasing the “mass” of anchor location 1706 as displacement between the two increases).
At
As illustrated in
HMD X700 displays boundary indicator at a location centered around stability indicator 1704 (e.g., about 1.6° away from anchor location 1706) and with dimensions that visually indicate a “high motion” boundary to the user (e.g., 9°, 10°, and/or 12°). For example, if the movement of the viewpoint during capture has moved the current viewpoint (e.g., represented by viewpoint location 1708) out of alignment with the initial viewpoint (e.g., represented by anchor location 1706) by more than 10°, the ongoing spatial video capture is classified as an “unstable” or “high motion” video capture. Accordingly, to indicate the “high motion” boundary, boundary indicator 1718 is displayed with an initial radius that approximates the radius of the “high motion” boundary (e.g., 7.5°, 8.5°, and/or 10°). For example, for a “high motion” boundary radius of 10° from anchor location 1706, boundary indicator 1718 is displayed with a radius of approximately 8°, such that the point of boundary indicator 1718 farthest from anchor location 1706 falls at the edge of the “high motion” boundary radius (e.g., 9.6° away from anchor location 1706). Like stability indicator 1704 and alignment indicator 1714, although boundary indicator 1718 is displayed along with the environment being captured in the ongoing spatial video capture, boundary indicator 1718 does not appear in the captured media itself.
At
In response to movement 1720, HMD X700 moves boundary indicator 1718, for example, so that it remains centered around stability indicator 1704 (e.g., boundary indicator 1718 is “locked” to stability indicator 1704). As illustrated in
At
As illustrated in
At
At
At
In some embodiments, in response to detecting movement 1724, HMD X700 initially moves stability indicator 1704 and boundary indicator 1718, as described above. Additionally, HMD X700 initially decreases the dimensions of boundary indicator 1718, for example, “snapping back” to a radius of 8°. Then, as the “high motion” boundary of 10° has been crossed, HMD X700 stops displaying stability indicator 1704 and boundary indicator 1718, as illustrated in
HMD X700 continues displaying alignment indicator 1714 during high motion video capture. In particular, as illustrated at
At
At
Once the spatial video capture satisfies the stability criteria and HMD X700 re-establishes a stable viewpoint, in response to further viewpoint movement, HMD X700 displays stability guidance in the manner described above with respect to stability indicator 1704, alignment indicator 1714, and boundary indicator 1718, but the displacement considered when moving and changing the stability guidance indicators is the displacement between anchor location 1736 (e.g., the current reference point) and viewpoint location 1708 instead of the displacement between anchor location 1706 (e.g., the former reference point) and viewpoint location 1708.
In particular, at
At
At
At
As a result of movement 1748, the displacement between anchor location 1736 and viewpoint location 1708 falls below the third alignment threshold (e.g., 1°). In some embodiments, as the displacement between anchor location 1736 and viewpoint location 1708 falls from the first alignment threshold (e.g., 2°) where alignment indicator 1740 is initially displayed to the third alignment threshold, HMD X700 displays alignment indicator 1740 at a location between stability indicator 1734 and viewpoint location 1708 (e.g., based on the one or more simulated physical properties). Then, when the displacement between anchor location 1736 and viewpoint location 1708 falls below the third alignment threshold, HMD X700 moves alignment indicator 1740 to “snap” to the location of stability indicator 1734, indicating that the movement of the viewpoint has substantially realigned the current viewpoint with the target viewpoint. As illustrated in
At
Additional descriptions regarding
The computer system (e.g., 101, 1-100, 1-200, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, 700, X700, and/or 702), while capturing (1802) spatial video media of an environment (e.g., a physical and/or virtual environment) using the one or more cameras (in some embodiments, while displaying a camera user interface with a camera preview, as described above (e.g., with respect to
The computer system, while displaying (1806) the virtual indicator element while the environment is visible via the display generation component, detects (1808) a first change in a viewpoint from which the spatial video media is being captured (e.g., 1710, 1712, 1716, 1720, 1722, 1724, 1726, 1728, 1738, 1744, and/or 1748) (e.g., movement (e.g., rotation and/or translation) of the one or more cameras with respect to the environment; in some embodiments, detected using one or more sensors (e.g., depth sensors, location sensors, and/or other sensors) of the computer system; in some embodiments, where media is being captured with an HMD, due to movement of the user's head, neck, and body; in some embodiments, the first change in the viewpoint is a translation or rotation in one or more directions (e.g., only horizontal, vertical, pitch, and/or yaw movements are considered) and not in one or more other directions (e.g., longitudinal and/or tilt movements are not considered)).
The computer system, in response to detecting (1810) the first change in the viewpoint from which the spatial video media is being captured, changes an appearance of the virtual indicator element (e.g., 1704 and/or 1734) (in some embodiments, a change in color, transparency, size, and/or display location (in some embodiments, the display location changes relative to the environment visible via the display generation component; in some embodiments, the display location changes relative to the display generation component (e.g., to remain fully or partially environment-locked)) to indicate the respective viewpoint (e.g., the starting and/or stable viewpoint) corresponding to the spatial video media (e.g., as illustrated in
Displaying a virtual indicator element (e.g., anchor indicator) with an appearance that updates in response to changes to the viewpoint of an ongoing spatial media capture media in order to indicate an anchor location in the environment being captured provides improved visual feedback about a state of the computer system, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). Doing so also enhances the operability of the system and makes the user-system interface more efficient (e.g., by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system) which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently. For example, indicating a steady target location (e.g., anchor location) with the virtual indicator element intuitively prompts and guides a user to capture steady spatial media without visually uncomfortable and/or unwanted movement of the viewpoint.
In some embodiments, changing the appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media includes changing the appearance of the virtual indicator element relative to the environment visible via the display generation component (e.g., as illustrated in
In some embodiments, displaying the virtual indicator element includes displaying the virtual indicator element at a virtual location (e.g., the virtual indicator element is rendered at a particular location (in some embodiments, at the focal/convergence location of the user's viewpoint; in some embodiments, at a predetermined location (e.g., 0.5, 1, or 2 meters away from the user's viewpoint); in some embodiments, in a region where other virtual UI elements are rendered) in the extended reality (XR) environment) in the environment (e.g., displaying the virtual indicator element overlaying a location of the environment that corresponds to the virtual location; in some embodiments, displaying the virtual indicator element as an XR object at the virtual location). In some embodiments, the virtual indicator element is not included in the spatial video media (e.g., as illustrated by spatial media items 1904 and 1904A, discussed below with respect to
In some embodiments, changing the appearance of the virtual indicator element includes moving (e.g., displaying movement of) the virtual indicator element to a respective location (e.g., displaying the virtual indicator element overlaying a location of the environment that corresponds to the respective location; in some embodiments, from a previous location; in some embodiments, from the anchor location (e.g., upon detecting initial movement of the viewpoint)) within an anchor region of the environment (e.g., as illustrated in
In some embodiments, moving the virtual indicator element to the respective location within the anchor region of the environment includes moving the virtual indicator element according to one or more simulated physical properties (e.g., simulated mass, inertia, gravitational attraction, magnetic attraction or repulsion, electrostatic attraction or repulsion, and/or a spring force) (in some embodiments, simulating the virtual indicator element as one or more physical objects reacting to detected change in the viewpoint of the spatial video media (e.g., the movement of the one or more cameras) with respect to the anchor location (e.g., within the frame of the reference of the environment); in some embodiments, determining the respective location within the anchor region based on the simulated physics (e.g., smoothing, reducing, increasing, and/or otherwise adjusting the location(s) determined using the simulated physics); in some embodiments, simulated spring resistance (e.g., simulating spring resistance between the anchor location and the location of the virtual indicator element); in some embodiments, simulated gravity (e.g., simulating the anchor location exerting a gravitational pull on the virtual indicator element)). Moving the virtual indicator element according to simulated physics provides improved visual feedback about a state of the computer system, assists the user with composing media capture events, and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the simulated physics of the virtual indicator element intuitively convey information about the anchor location and/or the first change in the viewpoint, allowing the user to quickly adjust capture to avoid visually uncomfortable and/or unwanted camera movement in the captured media.
In some embodiments, the first change in the viewpoint from which the spatial video media is being captured includes a viewpoint movement (e.g., 1710, 1712, 1716, 1720, 1722, 1724, 1726, 1728, 1738, 1744, and/or 1748) (e.g., translation and/or rotation; e.g., of the one or more cameras) of a first distance (in some embodiments, an overall a magnitude of displacement normalized to a single frame of reference; in some embodiments, a first angular distance; in some embodiments, a first linear distance) in a first direction (in some embodiments, in one or more directions (e.g., the first direction is an overall direction of movement normalized to a single frame of reference)), and moving the virtual indicator element to the respective location within the anchor region of the environment includes moving the virtual indicator element a second distance in the first direction (e.g., the movement of the virtual indicator element follows the movement of the viewpoint), wherein the second distance is shorter than the first distance (e.g., as illustrated in
In some embodiments, the virtual indicator element is environment-locked (in some embodiments, changing the appearance of the virtual indicator element includes moving the virtual indicator element (e.g., with respect to the display generation component) to be environment-locked (e.g., appearing located at the anchor location in the three-dimensional environment, such that, as the user's viewpoint shifts, the display location of the virtual indicator element shifts with respect to the viewport through which the environment is visible to appear as though it remains at the anchor location in the three-dimensional environment)). Moving the virtual indicator element to appear environment-locked at the anchor location provides improved visual feedback about a state of the computer system, assists the user with composing media capture events, and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the movement of the virtual indicator element intuitively conveys information about the anchor location, guiding and allowing the user to quickly adjust capture to move back to the steady location and avoid visually uncomfortable and/or unwanted camera movement in the captured media.
In some embodiments, the computer system, while displaying the virtual indicator element while the environment is visible, via the display generation component, detects a second change (in some embodiments, the second change is the same as the first change) in the viewpoint from which the spatial video media is being captured (e.g., 1712, 1716, 1720, 1722, 1724, 1726, 1728, 1738, 1744, and/or 1748) (e.g., movement (e.g., rotation and/or translation) of the one or more cameras with respect to the environment; in some embodiments, where media is being captured with an HMD, due to movement of the user's head, neck, and body; in some embodiments, the second change in the viewpoint is a translation or rotation in one or more directions (e.g., only horizontal, vertical, pitch, and/or yaw movements are considered) and not in one or more other directions (e.g., longitudinal and/or tilt movements are not considered)). In some embodiments, in response to detecting the second change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that the viewpoint from which the spatial video media is being captured satisfies a first set of one or more alignment criteria (e.g., criteria defining an initial (e.g., low) level of misalignment from an established stable/target viewpoint at which to initially display the alignment indicator; in some embodiments, when the second change results in the current viewpoint differing from the respective viewpoint (e.g., the viewpoint represented by the anchor location and/or indicated by the virtual indicator element) by at least a threshold amount; in some embodiments, when the second change results in “high movement” of the current viewpoint (e.g., the amount and/or rate of change of the location of the current viewpoint exceeds one or more thresholds)), the computer system displays (e.g., initially displaying), via the display generation component, a virtual alignment element (e.g., 1714 and/or 1740) (e.g., an icon or glyph, such as crosshairs, a dot, and/or a small shape; in some embodiments, the alignment element is displayed with a second color (e.g., white and/or another color) different from the color of the indicator element) while the environment is visible via the display generation component, wherein the virtual alignment element indicates a current location in the environment (e.g., 1708) that represents the viewpoint from which the spatial video media is being captured (e.g., as illustrated in FIGS. 17D-17L and 17N-17Q) (in some embodiments, a reference point in the portion of the environment currently included in the spatial media capture; in some embodiments the current location is a center point of the current viewpoint).
In some embodiments, while displaying the virtual alignment element (e.g., 1714 and/or 1740) while the environment is visible via the display generation component, the computer system detects a third change (in some embodiments, the third change is the same as the first change) in the viewpoint from which the spatial video media is being captured (e.g., 1716, 1720, 1722, 1724, 1726, 1728, 1730, 1744, and/or 1748) (e.g., movement (e.g., rotation and/or translation) of the one or more cameras with respect to the environment; in some embodiments, where media is being captured with an HMD, due to movement of the user's head, neck, and body; in some embodiments, the third change in the viewpoint is a translation or rotation in one or more directions (e.g., only horizontal, vertical, pitch, and/or yaw movements are considered) and not in one or more other directions (e.g., longitudinal and/or tilt movements are not considered)), and in response to detecting the third change in the viewpoint from which the spatial video media is being captured, the computer system moves (e.g., displaying movement of) the virtual alignment element to indicate the viewpoint (e.g., the current viewpoint), wherein moving the virtual alignment element is based on the third change in the viewpoint from which the spatial video media is being captured (e.g., as illustrated in
In some embodiments, the first set of one or more alignment criteria includes a first criterion that is satisfied when the current location in the environment (e.g., 1708) that represents the viewpoint from which the spatial video media is being captured (e.g., the current viewpoint; e.g., as a result of the second change in the viewpoint) is at least a first threshold distance (in some embodiments, a minimum angular distance (e.g., 2°, 3°, and/or 5° yaw and/or pitch rotation); in some embodiments, a minimum cartesian distance (e.g., 3, 5, or 10 cm vertical or horizontal translation) from the anchor location (e.g., 1706 and/or 1736). (E.g., the first criterion is satisfied when the third change in the current viewpoint moves the viewport by more than a threshold amount.) (In some embodiments, the first criterion is satisfied when the current location in the environment that represents the viewpoint from which the spatial video media is being captured is at least a first threshold distance from the displayed location of the indicator element.) Conditionally displaying the virtual alignment element (e.g., alignment indicator) in response to movement of the viewpoint that exceeds a particular threshold provides improved visual feedback about a state of the computer system without cluttering the user interface, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the initial appearance of the virtual alignment element indicates to the user that the viewpoint of the spatial media capture is not steady, as the viewpoint has departed from its previous position by more than a certain amount, allowing the user to monitor and adjust the movement of the viewpoint.
In some embodiments, the computer system, while displaying the virtual alignment element (e.g., 1714 and/or 1740) while the environment is visible via the display generation component, detects a fourth change (e.g., 1716, 1720, 1722, 1724, 1726, 1728, 1730, 1744, and/or 1748) (in some embodiments, the fourth change is the same as the first change and/or the third change) in the viewpoint from which the spatial video media is being captured. In some embodiments, in response to detecting the fourth change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that the viewpoint from which the spatial video media is being captured (e.g., the current viewpoint) satisfies a second set of alignment criteria (e.g., criteria defining a minimal misalignment from an established stable/target viewpoint where the viewpoint is considered substantially aligned (e.g., a misalignment margin of error) at which to hide the alignment indicator), the computer system ceases displaying the virtual alignment element (e.g., as illustrated in
In some embodiments, the third change in the viewpoint from which the spatial video media is being captured includes a viewpoint movement (e.g., translation and/or rotation; e.g., of the one or more cameras) of a third distance (in some embodiments, an overall a magnitude of displacement normalized to a single frame of reference; in some embodiments, a third angular distance; in some embodiments, a third linear distance in some embodiments, the third distance is the same as the first distance and/or the second distance) (in some embodiments, the movement is a movement in a respective direction), and moving the virtual alignment element based on the third change in the viewpoint from which the spatial video media is being captured includes, in accordance with a determination that the third change in the viewpoint from which the spatial video media is being captured satisfies a third set of one or more alignment criteria (e.g., criteria defining a state of viewpoint movement during which an anchor location remains established as the reference/target point for determining misalignment (e.g., a “high-motion” boundary has not yet been broken) and the alignment indicator is being displayed (e.g., the viewpoint has not yet substantially re-aligned with the stable/target viewpoint); in some embodiments, the third set of one or more alignment criteria includes a criterion that is satisfied when third change increases a distance (e.g., an angular and/or linear displacement) between the current location representing the viewpoint from which the spatial video media is being captured and the anchor location representing the respective viewpoint (e.g., the movement is a movement further away from alignment with the anchor location)), moving the virtual alignment element a fourth distance (in some embodiments, the fourth distance is the same as the first distance and/or the second distance), wherein the fourth distance is shorter than the third distance (e.g., as illustrated in
In some embodiments, moving the virtual alignment element based on the third change in the viewpoint from which the spatial video media is being captured includes moving the virtual alignment element according to one or more simulated physical properties (e.g., simulated mass, inertia, gravitational attraction, magnetic attraction or repulsion, electrostatic attraction or repulsion, and/or a spring force (in some embodiments, simulating the virtual alignment element as one or more physical objects reacting to detected change in the viewpoint of the spatial video media (e.g., the movement of the one or more cameras) with respect to the anchor location (e.g., within the frame of the reference of the environment); in some embodiments, determining the fourth path based on the simulated physics; in some embodiments, the simulated physics includes simulating at least one force acting on the virtual alignment element that changes based on the virtual alignment element's distance from another simulated object (in some embodiments, the at least one force includes simulated spring resistance (e.g., simulating spring resistance between the anchor location and the location of the virtual alignment element and/or a location representing the current viewpoint); in some embodiments, he at least one force includes simulated gravity (e.g., simulating the anchor location and/or a location representing the current viewpoint exerting a gravitational pull on the virtual alignment element))). Moving the virtual indicator element according to simulated physics provides improved visual feedback about a state of the computer system, assists the user with composing media capture events, and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the simulated physics of the virtual alignment element intuitively convey information about the change in the viewpoint and/or the anchor location, allowing the user to quickly adjust capture to avoid visually uncomfortable and/or unwanted camera movement in the captured media.
In some embodiments, the computer system, after moving the virtual alignment element according to the one or more simulated physical properties (in some embodiments, and without detecting further changes in the viewpoint), displays the virtual alignment element at a first location (in some embodiments, displaying an animation of the element shifting and/or moving to the first location), wherein the first location is closer to the anchor location than the current location in the environment that represents the viewpoint from which the spatial video media is being captured (e.g., as illustrated in
In some embodiments, displaying the virtual indicator element includes displaying the virtual indicator element in a first color (e.g., yellow and/or another color) and displaying the virtual alignment element includes displaying the virtual alignment element in a second color different from the first color (e.g., white and/or another color; e.g., the virtual indicator element and the virtual alignment element are visibly distinguishable by color). Displaying the virtual indicator element and the virtual alignment element in different colors provides improved visual feedback about a state of the computer system without cluttering the user interface. For example, by visually distinguishing the virtual indicator element and the virtual alignment element with color, the user can quickly and intuitively monitor both the anchor location and the movement of the viewpoint without the need for additional UI elements such as text labels.
In some embodiments, moving the virtual alignment element based on the third change in the viewpoint from which the spatial video media is being captured includes, while displaying the virtual indicator element at an indicator location (in some embodiments, displaying movement of the virtual indicator element to the indicator location in response to detecting the third change) in the environment (e.g., indicating the anchor location; in some embodiments, the indicator location is the anchor location; in some embodiments, the indicator location is the same as the virtual location and/or the respective location; in some embodiments, the indicator location is different from the virtual location and/or the respective location) and in accordance with a determination that the viewpoint from which the spatial video media is being captured (e.g., the current viewpoint) satisfies a fourth set of alignment criteria (e.g., criteria defining a minimal misalignment from an established stable/target viewpoint where the viewpoint is considered substantially aligned (e.g., a misalignment margin of error) at which to hide the alignment indicator), wherein the fourth set of alignment criteria includes a criterion that is satisfied when the current location in the environment that represents the viewpoint from which the spatial video media is being captured (e.g., the current viewpoint) is less than a third threshold distance (in some embodiments, a threshold angular distance (e.g., 1.5°, 2°, and/or 3.5° yaw and/or pitch rotation); in some embodiments, a threshold cartesian distance (e.g., 1.5, 2.5, or 4 cm vertical or horizontal translation; in some embodiments, the third threshold distance is different from the second threshold distance (e.g., the alignment element snaps back to the indicator element before reaching the displacement where the alignment indicator disappears); in some embodiments, the third threshold distance is the same as the second threshold distance (e.g., the alignment element snaps back to the indicator element and then disappears at the same point); in some embodiments, the third threshold distance is the same as the first threshold distance (e.g., the alignment element snaps back to the indicator element within the displacement where it originally appeared)) from the anchor location, moving (e.g., displaying movement of) the virtual alignment element to the indicator location (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual indicator element while the environment is visible via the display generation component, detects a fifth change (in some embodiments, the fifth change is the same as the first change, the second change, and/or the third change) in the viewpoint from which the spatial video media is being captured (e.g., 1716, 1720, 1722, 1724, 1726, 1728, 1730, 1744, and/or 1748) (e.g., movement (e.g., rotation and/or translation) of the one or more cameras with respect to the environment; in some embodiments, where media is being captured with an HMD, due to movement of the user's head, neck, and body; in some embodiments, the change in the viewpoint is a translation or rotation in one or more directions (e.g., only horizontal, vertical, pitch, and/or yaw movements are considered) and not in one or more other directions (e.g., longitudinal and/or tilt movements are not considered)). In some embodiments, the computer system, in response to detecting the fifth change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that the viewpoint from which the spatial video media is being captured (e.g., the current viewpoint) satisfies a fifth set of alignment criteria (e.g., criteria defining a medium-high level (e.g., nearing, but still under the high-motion boundary threshold) of misalignment from an established stable/target viewpoint at which to initially display the boundary indicator), displays a virtual boundary element (e.g., 1718 and/or 1742) (e.g., a circle and/or other framing element; in some embodiments, the virtual boundary element frames a region that initially includes the virtual alignment element; in some embodiments, the virtual boundary element frames a region that includes the virtual indicator element (in some embodiments, the virtual boundary element is centered around the virtual indicator element); in some embodiments, the virtual boundary element frames a region that includes the anchor location) visually representing a predetermined (e.g., a respective, boundary, and/or maximum) threshold distance from the anchor location (in some embodiments, a threshold distance from the anchor location that, when exceeded, classifies the spatial video capture as a high-motion video capture; in some embodiments, a threshold angular distance (e.g., 8°, 10°, and/or 12° yaw and/or pitch rotation); in some embodiments, a threshold cartesian distance (e.g., 15, 17, or 20 cm vertical or horizontal translation)) (in some embodiments, a dimension (e.g., a radius, length, and/or width) of the virtual boundary element approximates the predetermined threshold distance (e.g., if the predetermined threshold distance from the anchor location is 10°, the virtual boundary element is displayed as a circle with a radius, such that, when centered on the virtual indicator element, the point of the virtual boundary element furthest from the anchor location is 10° from the anchor location (e.g., an 8°-10° radius, depending on the maximum distance the virtual indicator element can move from the anchor location)) while the environment is visible via the display generation component (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual boundary element visually representing the predetermined threshold distance from the anchor location (in some embodiments, while the viewpoint from which the spatial media is being captured satisfies the fourth set of one or more alignment criteria) while the environment is visible via the display generation component, detects a sixth change in the viewpoint from which the spatial video media is being captured (e.g., 1720, 1722, 1724, 1726, and/or 1728) (in some embodiments, the sixth change is the same change in the viewpoint as the first change and/or the third change), and in response to detecting the fifth change, the computer system moves (e.g., displaying movement of) the virtual indicator element (e.g., 1704 and/or 1734) along a first path (e.g., the anchor indicator moves in response to changes (e.g., movements) in the viewpoint) and moves the virtual boundary element (e.g., 1718 and/or 1742) along the first path (e.g., moving the boundary indicator along with the anchor indicator in response to changes in the viewpoint; in some embodiments, the virtual boundary element remains centered around the virtual indicator element). Moving the virtual boundary element along with the virtual indicator element in response to movement of the viewpoint provides improved visual feedback about a state of the computer system, assists the user with composing media capture events, and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, the movement of the virtual boundary element intuitively conveys information about both the anchor location and the outer bounds of “low” or “steady” movement, allowing the user to quickly adjust capture to avoid visually uncomfortable and/or unwanted camera movement in the captured media, while also conveying some “forgiveness” of slight viewpoint movements.
In some embodiments, the computer system, while displaying the virtual alignment element and the virtual boundary element, detects a seventh change in the viewpoint from which the spatial video media is being captured (e.g., 1720, 1722, 1724, 1726, and/or 1728) (in some embodiments, the seventh change is the same change in the viewpoint as the first change, the third change, and/or the sixth change). In some embodiments, in response to detecting the seventh change and in accordance with a determination that the viewpoint satisfies a set of one or more increase criteria (in some embodiments, the set of one or more increase criteria includes a criterion that is satisfied when the distance between the current viewpoint location and the anchor location still falls within a particular range (e.g., the range where the virtual boundary element is displayed, e.g., between the predetermined threshold distance and the fourth threshold distance)), wherein the set of one or more reduction criteria includes a criterion that is satisfied when a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured and the anchor location increases as a result of the seventh change in the viewpoint (e.g., as the current viewpoint moves further out of alignment with the initial/steady viewpoint), the computer system increases an opacity of the virtual alignment element at a first rate and increases an opacity of the virtual boundary element by at a second rate (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual boundary element visually representing the predetermined threshold distance from the anchor location (in some embodiments, while the viewpoint from which the spatial media is being captured satisfies the fourth set of one or more alignment criteria) while the environment is visible via the display generation component, detects an eighth change in the viewpoint from which the spatial video media is being captured (e.g., 1720, 1722, 1724, 1726, and/or 1728) (in some embodiments, the eighth change is the same change in the viewpoint as the first change, the third change, the sixth change, and/or the seventh change), and in response to the eighth change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured and the anchor location increases to within a boundary distance range (e.g., 0.5°, 1°, or 2°-wide range; e.g., a 1 cm, 2 cm, or 4 cm-wide range) as a result of the eighth change in the viewpoint (e.g., as the current viewpoint moves further out of alignment with the initial/steady viewpoint and approaches the edge of the boundary), the computer system increases (e.g., displaying an increase of) a size (e.g., a dimension (e.g., a radius, length, and/or width) of the boundary element) of the virtual boundary element from a first size to a second size (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual boundary element at the second size, detects a ninth change in the viewpoint from which the spatial video media is being captured (e.g., 1726 and/or 1728) (in some embodiments, the ninth change is the same change in the viewpoint as the first change, the third change, the sixth change, the seventh change, and/or the eighth change), and in response to the ninth change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured and the anchor location decreases to below the boundary distance range as a result of the ninth change in the viewpoint (e.g., as the current viewpoint moves closer to alignment with the initial/steady viewpoint from the edge of the boundary), the computer system decreases (e.g., displaying an decrease of) the size (e.g., a dimension (e.g., a radius, length, and/or width) of the boundary element) of the virtual boundary element from the second size to the first size (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual boundary element visually representing the predetermined threshold distance from the anchor location while the environment is visible via the display generation component, detects a tenth change in the viewpoint from which the spatial video media is being captured (e.g., 1720, 1722, 1724, 1726, and/or 1728) (in some embodiments, the tenth change is the same change in the viewpoint as the first change, the third change, the sixth change, the seventh change, the eighth change, and/or the ninth change), and in response to the tenth change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured and the anchor location exceeds the predetermined threshold distance as a result of the tenth change in the viewpoint (e.g., as the current viewpoint moves further out of alignment with the initial/steady viewpoint and approaches the edge of the boundary), the computer system decreases (e.g., displaying an decrease of) the size (e.g., a dimension (e.g., a radius, length, and/or width) of the boundary element) of the virtual boundary element (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual boundary element visually representing the predetermined threshold distance from the anchor location (in some embodiments, while the viewpoint from which the spatial media is being captured satisfies the fourth set of one or more alignment criteria) while the environment is visible via the display generation component, detects an eleventh change in the viewpoint from which the spatial video media is being captured (e.g., 1720, 1722, 1724, 1726, and/or 1728) (in some embodiments, the eleventh change is the same change in the viewpoint as the first change, the third change, the sixth change, the seventh change, the eighth change, the ninth change, and/or the tenth change), and in response to the eleventh change in the viewpoint from which the spatial video media is being captured and in accordance with a determination that a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured and the anchor location exceeds the predetermined threshold distance as a result of the eleventh change in the viewpoint (e.g., as the current viewpoint moves further out of alignment with the initial/steady viewpoint and approaches the edge of the boundary), the computer system ceases displaying the virtual boundary element (e.g., as illustrated in
In some embodiments, prior to detecting the third change in the viewpoint from which the spatial video media is being captured, the virtual alignment element is displayed at a first display location relative to the viewpoint from which the spatial video media is being captured (in some embodiments, relative to a display of the display generation component; in some embodiments, relative to a viewport through which the environment is visible), and moving the virtual alignment element based on the third change in the viewpoint from which the spatial video media is being captured includes, in accordance with a determination that the third change in the viewpoint meets a set of one or more movement criteria, displaying the virtual alignment element at a second display location that, relative to the viewpoint from which the spatial video media is being captured, is different than the first display location (e.g., displaying the virtual alignment element in a non-viewpoint locked state), wherein the set of one or more movement criteria include a criterion that is met when the third change in the viewpoint from which the spatial video media includes a movement in at least one direction of a plurality of directions (e.g., as illustrated in
In some embodiments, displaying the virtual indicator element (e.g., 1704 and/or 1734) includes positioning the virtual indicator element within a virtual plane in the environment (e.g., the virtual indicator element is rendered/displayed at a particular depth/distance away from a user in the extended reality (XR) environment (e.g., appearing at a particular focal plane); in some embodiments, other virtual UI elements (e.g., the alignment indicator, the boundary indicator, and/or other media capture UI elements) are also rendered within the virtual plane), wherein the virtual plane in the environment is spaced at least a threshold depth away from (in some embodiments, in front of) a user (e.g., the virtual plane including the virtual indicator appears spaced apart from the user's viewpoint; e.g., at least 10 cm (e.g., allowing a typical user's eyes to converge on the virtual plane), 50 cm, and/or 1 m away). Displaying the virtual indicator element in a virtual plane spaced at least a minimum depth apart from the user's viewpoint provides improved ergonomics of media capture devices, for example, by allowing the user to comfortably view the virtual indicator element without double vision, eye strain, blurring, and/or other visual degradation. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement), for example, due to difficulty seeing the virtual indicator or difficulty adjusting focus between the virtual indicator element, other UI elements, and/or and other XR content.
In some embodiments, the computer system detects (in some embodiments, when initially displaying the virtual indicator element and/or other UI elements; in some embodiments, periodically (e.g., at a sampling rate) while displaying the virtual indicator element and/or other UI elements) a gaze of the user (e.g., 732 and/or X732) (e.g., the current gaze of the user) and positions (in some embodiments, when initially displaying the virtual indicator element and/or other UI elements; in some embodiments, periodically (e.g., at a display refresh rate) while displaying the virtual indicator element and/or other UI elements) the virtual plane based on the gaze of the user (in some embodiments, the virtual plane is selected based on the convergence point of the user's gaze (e.g., at a depth away from the user that falls at or near the convergence point); in some embodiments, the virtual plane is selected based on the direction of the user's gaze (e.g., the virtual plane is selected to be in front of the user and/or within a certain region of the viewport through which the environment is visible)). (In some embodiments, changing the appearance of the virtual indicator element (e.g., in response to the first change in the viewpoint) includes detecting the gaze of the user and updating the selection of the virtual plane (e.g., to maintain consistent display of the virtual indicator element in the user's view).) In some embodiments, the virtual plane is selected to be perpendicular or substantially perpendicular to a direction of the gaze of the user. Displaying the virtual indicator element in a virtual plane selected based on the user's gaze provides improved ergonomics of media capture devices, for example, by allowing the user to comfortably view the virtual indicator element without double vision, eye strain, blurring, and/or other visual degradation. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement), for example, due to difficulty seeing the virtual indicator or difficulty adjusting focus between the virtual indicator element, other UI elements, and/or other XR content.
In some embodiments, positioning the virtual plane based on the gaze of the user includes determining a convergence location (e.g., a current convergence point and/or focal plane) of the gaze of the user (e.g., a virtual location at which the user's right eye sightline and left eye sightline intersect), wherein the virtual plane includes the convergence location (e.g., the virtual indicator element (in some embodiments, and other UI elements) are displayed where the user's eyes are focusing). (In some embodiments, positioning the virtual plane based on the gaze of the user includes: in accordance with a determination that the convergence location of the gaze of the user is a first distance from the user's eyes, positioning the virtual plane at a depth of the first distance (e.g., remaining perpendicular or substantially perpendicular to a direction of the gaze of the user); and in accordance with a determination that the convergence location of the gaze of the user is a second distance from the user's eyes, positioning the virtual plane at a depth of the second distance (e.g., remaining perpendicular or substantially perpendicular to a direction of the gaze of the user).) Displaying the virtual indicator element in a virtual plane selected based on the convergence of the user's gaze provides improved ergonomics of media capture devices, for example, by allowing the user to comfortably view the virtual indicator element without double vision, eye strain, blurring, and/or other visual degradation. Doing so also assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement), for example, due to difficulty seeing the virtual indicator or difficulty adjusting focus between the virtual indicator element, other UI elements, and/or other XR content.
In some embodiments, changing the appearance of the virtual indicator element (e.g., 1704 and/or 1734) to indicate the respective viewpoint corresponding to the spatial video media includes, in accordance with a determination that a distance between a current location in the environment (e.g., 1708) that represents the viewpoint from which the spatial video media is being captured (in some embodiments, a reference point in the portion of the environment currently included in the spatial media capture; in some embodiments the current location is a center point of the current viewpoint; in some embodiments, the current location in the environment that represents the viewpoint from which the spatial video media is being captured is the same as the current location in the environment that represents the viewpoint from which the spatial video media is being captured) and the anchor location (e.g., 1706 and/or 1736) exceeds a second predetermined (e.g., a respective, boundary, and/or maximum) threshold distance (in some embodiments, a threshold distance from the anchor location that, when exceeded, classifies the spatial video capture as a high-motion video capture; in some embodiments, a threshold angular distance (e.g., 8°, 10°, and/or 12° yaw and/or pitch rotation); in some embodiments, a threshold cartesian distance (e.g., 15, 17, or 20 cm vertical or horizontal translation); in some embodiments, the second predetermined threshold distance is the same as the predetermined threshold distance), ceasing displaying the virtual indicator element (e.g., as illustrated in
In some embodiments, changing the appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media includes, in accordance with a determination that a distance between the current location in the environment (e.g., 1708) that represents the viewpoint from which the spatial video media is being captured (in some embodiments, a reference point in the portion of the environment currently included in the spatial media capture; in some embodiments the current location is a center point of the current viewpoint; in some embodiments, the current location in the environment that represents the viewpoint from which the spatial video media is being captured is the same as the current location in the environment that represents the viewpoint from which the spatial video media is being captured) and the anchor location (e.g., 1706 and/or 1736) increases to at least a fifth threshold distance (e.g., as a result of the first change in the viewpoint), decreasing an opacity of the virtual indicator element (e.g., as illustrated in
In some embodiments, the computer system, after ceasing displaying the virtual indicator element and in accordance with a determination that the viewpoint from which the spatial video media is being captured meets a set of one or more stability criteria, wherein the set of one or more stability criteria includes a criterion that is met when movement of the viewpoint from which the spatial video media is being captured remains below a movement threshold (e.g., one or more maximum velocities and/or accelerations of viewpoint movement) for at least a threshold duration of time (e.g., the criterion is met when angular velocity remains below 0.2°/s, the angular acceleration remains below 0.5°/s, the linear velocity remains below 0.3 m/s, and/or the angular acceleration remains below 0.6 m/s for at least 2, 3, or 5 seconds), displays, via the display generation component, a second virtual indicator element (e.g., 1704 and/or 1734) (e.g., an icon or glyph, such as crosshairs, a dot, and/or a small shape; in some embodiments, the indicator is displayed with a first color (e.g., yellow and/or another color); in some embodiments, the indicator is fully environment-locked (e.g., the displayed location of the indicator is locked at the anchor location); in some embodiments, the indicator is partially environment-locked (e.g., the displayed location of the indicator moves with respect to the anchor location); in some embodiments, the second virtual indicator element is the same as, has the same appearance as, and/or behaves the same way as the original virtual indicator element (e.g., the virtual indicator element is respawned or redisplayed), e.g., as described above with respect to
In some embodiments, the computer system, after ceasing displaying the virtual indicator element and in accordance with a determination that the viewpoint from which the spatial video media is being captured does not meet the set of one or more stability criteria, foregoes displaying the second virtual indicator element (e.g., as illustrated in
In some embodiments, the computer system, while displaying the virtual indicator element (e.g., 1704 and/or 1734) while the environment is visible via the display generation component, displays (e.g., initially displaying), via the display generation component, a second virtual alignment element (e.g., 1714 and/or 1740) (e.g., an icon or glyph, such as crosshairs, a dot, and/or a small shape; in some embodiments, the alignment element is displayed with a second color (e.g., white and/or another color) different from the color of the indicator element;); in some embodiments, the second virtual alignment element is the same as, has the same appearance as, and/or behaves the same way as the virtual indicator element described above (e.g., the second virtual alignment element indicates the current location in the environment that represents the viewpoint from which the spatial video media is being captured)) while the environment is visible via the display generation component, and after ceasing displaying the virtual indicator element, the computer system continues displaying the second virtual alignment element (e.g., as illustrated in
In some embodiments, changing the appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media includes, in accordance with a determination that a distance between a current location in the environment that represents the viewpoint from which the spatial video media is being captured (in some embodiments, a reference point in the portion of the environment currently included in the spatial media capture; in some embodiments the current location is a center point of the current viewpoint; in some embodiments, the current location in the environment that represents the viewpoint from which the spatial video media is being captured is the same as the current location in the environment that represents the viewpoint from which the spatial video media is being captured) and the anchor location falls below a maintenance threshold distance (in some embodiments, an alignment margin of error (e.g., when movement remains within the minimum threshold distance from the anchor location, the viewpoint is considered substantially aligned); in some embodiments, a minimum angular distance (e.g., 10, 1.5°, and/or 3° yaw and/or pitch rotation); in some embodiments, a minimum cartesian distance (e.g., 1, 2, or 5 cm vertical or horizontal translation); in some embodiments, the minimum threshold distance is the same as the second threshold distance), ceasing displaying the virtual indicator element. Ceasing displaying the virtual indicator element when the viewpoint has moved substantially into alignment with the initial/steady viewpoint provides improved visual feedback about a state of the computer system without cluttering the user interface, which assists the user with composing media capture events and reduces the risk that transient media capture opportunities are mis-captured. For example, the disappearance of the virtual indicator element intuitively indicates to the user that the user has successfully steadied the viewpoint, while also avoiding unnecessarily cluttering the capture UI during steady capture.
In some embodiments, changing the appearance of the virtual indicator element to indicate the respective viewpoint corresponding to the spatial video media includes, in accordance with a determination that a distance between the current location in the environment that represents the viewpoint from which the spatial video media is being captured (in some embodiments, a reference point in the portion of the environment currently included in the spatial media capture; in some embodiments the current location is a center point of the current viewpoint; in some embodiments, the current location in the environment that represents the viewpoint from which the spatial video media is being captured is the same as the current location in the environment that represents the viewpoint from which the spatial video media is being captured and/or the second location in the environment that represents the viewpoint from which the spatial video media is being captured) and the anchor location decreases to below a sixth threshold distance (e.g., as a result of the first change in the viewpoint), decreasing an opacity of the virtual indicator element (e.g., the visual indicator fades as it approaches the threshold where it disappears), wherein the sixth threshold distance is more than the maintenance threshold distance. Decreasing the opacity of the virtual indicator element provides improved visual feedback about a state of the computer system, assists the user with composing media capture events, and reduces the risk that transient media capture opportunities are mis-captured (e.g., due to visually uncomfortable and/or unwanted camera movement). For example, fading the indicator element intuitively indicates to the user that the current media capture is approaching alignment with the initial/steady viewpoint.
In some embodiments, the computer system, while capturing the spatial video media of the environment and in accordance with a determination that the viewpoint from which the spatial video media is being captured satisfies a sixth set of alignment criteria (e.g., criteria defining a minimal misalignment from an established stable/target viewpoint at which the viewpoint is considered substantially aligned (e.g., a misalignment margin of error); in some embodiments, the sixth set of alignment criteria includes a criterion that is satisfied when a distance between a current location representing the current viewpoint and the anchor location falls below an alignment threshold distance (in some embodiments, an alignment margin of error (e.g., when movement remains within the minimum threshold distance from the anchor location, the viewpoint is considered substantially aligned); in some embodiments, a minimum angular distance (e.g., 10, 1.5°, and/or 3° yaw and/or pitch rotation); in some embodiments, a minimum cartesian distance (e.g., 1, 2, or 5 cm vertical or horizontal translation); in some embodiments, the alignment threshold distance is the same as the second threshold distance and/or the minimum threshold distance) in some embodiments, the sixth set of alignment criteria includes a criterion that is satisfied when the movement of the viewpoint has stabilized (e.g., remains below one or more threshold velocities and/or accelerations for at least a respective duration)), displays a graphical alignment indication (e.g., as illustrated in
In some embodiments, the computer system, after capturing the spatial video media of the environment, displays a playable representation of the spatial video media (e.g., as illustrated in
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, 1600, 1800, and 2000 may be interchanged, substituted, and/or added between these methods. For example, capturing spatial video media according to method 1800 may implement the user interfaces and indicators described with respect to methods 800, 1000, 1200, and/or 1400, and the spatial video media captured according to method 1800 may be viewed using the user interfaces and techniques described with respect to methods 1600 and 2000. For brevity, these details are not repeated here.
At
For example, as illustrated in
In some embodiments, spatial media item 1904 may be a brief animated photo (e.g., a photo with a “live” effect) where each of the several frames captured when the photo is taken (e.g., before and/or after the input requesting capture of the photo was detected) includes stereoscopic depth information, for example, a first frame component for the viewer's right eye and a second frame component for the viewer's left eye. Like video media, a brief animated photo can be played back (e.g., as a brief animation, a loop, and/or a “bouncing” or “reversing” loop) or viewed as a still preview (e.g., including the first frame component and the second frame component for a single key frame).
HMD X700 displays media viewer user interface 1902 overlaying XR environment 1908 (e.g., a physical environment and/or an environment-locked virtual environment), such that portions of XR environment 1908 that are not currently overlaid and/or are semi-transparently overlaid by any of the elements of media user interface 1902 remain visible to the user via display module X702 of HMD X700, for example, as the rendered output of the virtual environment and/or optical and/or video passthrough of the physical environment.
View 1906 of spatial media item 1904 is an un-expanded media viewing option (e.g., format). In particular, as illustrated in
At
At
At
At
At
At
At
In some embodiments, HMD X700 displays view 1938 (e.g., the expanded view) of spatial media item 1904 with a second set of three-dimensional effects, which may include additional or alternative three-dimensional effects than the first set of three-dimensional effects applied while displaying view 1706 of spatial media item 1904. For example, as with the first set of three-dimensional effects, displaying view 1938 may include concurrently displaying a first view component for the user's right eye and a second, different view component for the user's left eye, creating an appearance/illusion of depth. For example, the first and second view components may include the different components of spatial media item 1904 such that the contents of spatial media item 1904 appear three-dimensional. As another example, the first and second view components may include different views of view 1938 such that view 1938 appears as a different virtual object than view 1706, for instance, modeling view 1938 as a curved scrim (e.g., panorama), a hemisphere, or a sphere around the user in the XR environment.
At
As illustrated in
In response to input 1952, at
At
At
At
At
Based on the stability characteristics of media item 1964, such as movements of the capture viewpoint detected during capture (e.g., while capturing as described above with respect to
Additional descriptions regarding
The computer system (e.g., 101, 1-100, 1-200, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, 700, X700, and/or 702), while displaying (2002) (in some embodiments, in a media viewing user interface (e.g., a gallery and/or library UI); in some embodiments, in a media playback user interface), via the display generation component (e.g., 1-102, 1-120a, 1-120b, 11.1.1-104a, 11.1.1-104b, 1-108, 1-122a, 1-122b, 1-202, 1-306, 1-308, 1-320, 1-322a, 1-322b, 1-406, 1-402, 1-421, 3-108, 6-334, 11.3.2-100, 11.3.2-104, 11.3.2-200, 11.3.2-204, 708, and/or X702), a representation (e.g., 1906) of a spatial media item (e.g., 1904, 1904A, and/or 1964) (in some embodiments, a thumbnail or preview of the media item; in some embodiments, while media playback of the media item is ongoing; in some embodiments, while media playback of the media item is not ongoing (e.g., paused/stopped)), wherein the spatial media item includes a first component corresponding to a viewpoint of a right eye and a second component, different from the first component, corresponding to a viewpoint of a left eye that when viewed concurrently create an illusion of a spatial representation (e.g., spatial video; e.g., concurrently viewing the first video component and the second video component creates an illusion of a three-dimensional representation of the video media; e.g., viewing different components with the left and right eye creates the illusion of depth by simulating parallax of the media contents) and in accordance with a determination (e.g., based on the contents of the media item (e.g., the video data) and/or metadata associated with the media item; in some embodiments, based on the amount of movement (e.g., translation, rotation, velocity, and/or acceleration) of a viewpoint of the media item and/or apparent camera movement in the media item) (in some embodiments, the apparent camera movement includes movement of the physical camera(s) used to capture the media item; in some embodiments, the apparent camera movement is a movement of a virtual camera (e.g., a “camera” capturing in and “moving” around a virtual environment))) that the spatial media item meets a set of one or more stability criteria (2004) (e.g., the media item has a more stable viewpoint during capture (e.g., relatively little apparent camera movement during capture) compared to a media item that does not meet the criterion; in some embodiments, the stability criteria include at least one criterion that is met when the apparent camera movement present in the media item does not exceed a threshold (e.g., overall angular movement of a radius less than 10°; angular velocity of less than 0.2 degrees/second; angular acceleration of less than 1 degree/second2, and/or another threshold movement amount)), displays (2006) a spatial viewing indicator (e.g., 1918A and/or 1928) (e.g., an icon, affordance, and/or other user interface element indicating an option (e.g., that is affected when the indicator is selected) of an alternative spatial viewing mode for the media item (e.g., an expanded (e.g., full screen and/or immersive) viewing mode, a viewing mode with particular three-dimensional output effect(s) applied, a viewing mode without attenuation effects applied, and/or another viewing mode)) with a first appearance (e.g., an appearance indicating that the alternative spatial viewing mode is available, recommended, and/or appropriate for the media item) concurrently with the representation of the spatial media item (e.g., as illustrated in
The computer system, while displaying (2002) the representation (e.g., 1906) of the spatial media item (e.g., 1904, 1904A, and/or 1964) and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria (e.g., the media item has a less stable viewpoint during capture (e.g., relatively more apparent camera movement during capture) compared to a media item that does meet the criterion; in some embodiments, the media item does not meet the set of one or more stability criteria when the apparent camera movement present in the media item exceeds one or more thresholds), forgoes displaying the spatial viewing indicator with the first appearance (in some embodiments, forgoing displaying the spatial viewing indicator (e.g., hiding the option of the alternative spatial viewing mode); in some embodiments, displaying the spatial viewing indicator with a different appearance (e.g., an appearance indicating that an alternative spatial viewing mode is not available, recommended, and/or appropriate for the media item, such as including a warning icon or glyph and/or visually deemphasizing (e.g., graying out, minimizing, and/or increasing the transparency) the appearance)) concurrently with the representation of the spatial media item (e.g., as illustrated in
In some embodiments, the computer system, prior to displaying the representation of the spatial media item, receives a request to display the representation of the spatial media item (e.g., 1752 in
In some embodiments, the spatial viewing indicator (e.g., 1918A and/or 1928), when selected (e.g., via 1930, 1936, and/or 1958) (e.g., via a touch input, a gesture input, an air gesture input, a gaze input, and/or a button press input, such as a pinch air gesture input detected while a gaze input is directed to the spatial viewing indicator), causes the computer system to initiate providing an expanded representation (e.g., 1938) of the spatial media item (e.g., 1904, 1904A, and/or 1964) (in some embodiments, initiating displaying the expanded representation of the spatial media item; in some embodiments, initiating providing the expanded view includes performing a preliminary expansion step prior to displaying the spatial media item with the expanded view, such as providing a confirmation request or warning before proceeding with expanding the view), wherein a size of the expanded representation of the spatial media item (e.g., with respect to a viewport though which the environment is visible; in some embodiments, with respect to the size of the display (e.g., expanding to a “full screen” or “maximized” view of the spatial media item); in some embodiments, in embodiments using an HMD, expanding display of the spatial media item towards and/or beyond (e.g., a “frameless” or “immersive” view) the peripheries of the viewport though which the environment is visible)) exceeds a size of the representation of the spatial media item (e.g., the initially-displayed (e.g., default, un-expanded) representation of the spatial media item; e.g., with respect to the viewport though which the environment is visible) (e.g., as illustrated in
In some embodiments, the computer system, while displaying the representation (e.g., 1906) of the spatial media item, provides a first three-dimensional effect (e.g., by adjusting display of the representation of the spatial media item to create the first three-dimensional effect) for the spatial media item (e.g., 1904, 1904A, and/or 1964) (in some embodiments, the first three-dimensional effect includes displaying the spatial media item as a first virtual object in a three-dimensional XR environment (e.g., a virtual display of a particular size and shape that can, e.g., cast shadow, cast light, and/or interact with the XR environment in other ways); in some embodiments, the first three-dimensional effect includes outputting both the first component and the second component to create the illusion of spatial representation; in some embodiments, the first three-dimensional effect includes outputting spatial audio for the spatial media item (e.g., audio including at least a first channel for a left ear of the user and a second, different channel for a right ear of the user, using binaural hearing to create the illusion of sound emanating from a particular location in three-dimensional space)). In some embodiments, the computer system, while displaying the expanded representation (e.g., 1938) of the spatial media item (in some embodiments, in response to detecting a selection of the spatial viewing indicator and/or another input (e.g., an additional confirmation input)), provides a second three-dimensional effect for the spatial media item (e.g., 1904, 1904A, and/or 1964), wherein the second three-dimensional effect is different from the first three-dimensional effect (in some embodiments, the second three-dimensional effect includes displaying the spatial media item as a different virtual object in a three-dimensional XR environment (e.g., a “frameless” or “immersive” object that extends into and beyond the peripheries of the viewport though which the environment is visible, such as a curved scrim, dome, or globe, that the user can view from different viewpoints); in some embodiments, the second three-dimensional effect includes rendering and/or recreating portions of the spatial media item in three-dimensions; in some embodiments, the second three-dimensional effect also includes one or more three-dimensional effects included in the first three-dimensional effect, such as outputting the both the first component and the second component of the spatial media item and/or outputting spatial audio). Providing the spatial media item with different three-dimensional effects in the expanded view and in the default (e.g., un-expanded) view provides improved control of media playback and improved ergonomics of media playback devices. For example, the three-dimensional effects used for the spatial media item are automatically changed based on whether the spatial media item is likely to cause physical discomfort when viewed with particular three-dimensional effects. Doing so also makes the user-system interface more efficient by helping the user to provide proper inputs and reducing user mistakes when operating/interacting with the system which, additionally, reduces power usage and improves battery life of the system by enabling the user to use the system more quickly and efficiently, for example, by automatically changing the three-dimensional effects without requiring extraneous inputs to select and apply the three-dimensional effects.
In some embodiments, the computer system detects an input (e.g., 1930 and/or 1958) selecting the spatial viewing indicator (e.g., via a touch input, a gesture input, an air gesture input, a gaze input, and/or a button press input, such as a pinch air gesture input detected while a gaze input is directed to the spatial viewing indicator; e.g., an input requesting the expanded view of the spatial media item), and in response to detecting the input selecting the spatial viewing indicator, initiates providing the expanded representation (e.g., 1938) of the spatial media item (e.g., 1904, 1904A, and/or 1964). In some embodiments, initiating providing the expanded representation of the spatial media item includes, in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, displaying, via the display generation component, an expansion notification interface (e.g., 1934) (in some embodiments, while displaying the representation of the spatial media item; in some embodiments, overlaying the representation of the spatial media item), wherein the expansion notification interface indicates that the spatial media item does not meet the set of one or more stability criteria (e.g., as illustrated in
In some embodiments, the expansion notification (e.g., 1934) interface includes a selectable confirmation object (e.g., 1934A) (e.g., a confirmation affordance) that, when selected, causes the computer system to initiate displaying the expanded representation of the spatial media item (e.g., proceeds with expanding the view), and a selectable cancellation object (e.g., 1934B) (e.g., a cancel affordance) that, when selected, causes the computer system to forego displaying the expanded representation of the spatial media item (in some embodiments, and causes display of the notification interface to cease; in some embodiments, and resumes displaying the representation of the spatial media item (e.g., the default/un-expanded view)). Providing a notification interface with the option to confirm or cancel expanding the view of the spatial media item provides improved control of media playback and improved ergonomics of media playback devices, for example, by providing the user with additional information and the opportunity to cancel expanding the view of spatial media items that are likely to cause physical discomfort in the expanded state.
In some embodiments, the computer system, in accordance with the determination that the spatial media item does not meet the set of one or more stability criteria (e.g., the media item has a less stable viewpoint during capture (e.g., relatively more apparent camera movement during capture) compared to a media item that does meet the stability criteria; in some embodiments, the media item does not meet the set of one or more stability criteria when the apparent camera movement present in the media item exceeds one or more thresholds), displays the spatial viewing indicator (e.g., 1918A and/or 1928) with a second appearance, different from the first appearance (e.g., an appearance indicating that the alternative spatial viewing mode is not recommended and/or appropriate for the media item; in some embodiments, the second appearance visually deemphasizes the spatial viewing indicator in comparison to the first appearance; in some embodiments, the second appearance includes an additional warning indicator), concurrently with the representation of the spatial media item (e.g., as illustrated in
In some embodiments, displaying the spatial viewing indicator (e.g., 1918A and/or 1928) with the first appearance includes displaying the spatial viewing indicator at a first contrast level (e.g., as illustrated in
In some embodiments, displaying spatial viewing indicator with the second appearance includes displaying the spatial viewing indicator with a warning icon (e.g., 1920) (e.g., a warning glyph, such as an exclamation point, a stop sign, a caution triangle, a no/prohibited symbol, an X, a minus, a red symbol, a yellow symbol, and/or another symbol, icon, or text element) (e.g., as illustrated in
In some embodiments, forgoing displaying the spatial viewing indicator with the first appearance (e.g., in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria) includes forgoing displaying the spatial viewing indicator concurrently with the representation of the spatial media item (e.g., as illustrated in
In some embodiments, the computer system, after forgoing displaying the spatial viewing indicator concurrently with the representation of the spatial media item and while displaying the representation of the spatial media item without the spatial viewing indicator, displays, via the display generation component, a selectable menu user interface object (e.g., 1910) (e.g., a menu affordance, such as an icon, text, and/or other user interface object). In some embodiments, the computer system detects an input (e.g., 1910) selecting the selectable menu user interface object (e.g., via a touch input, a gesture input, an air gesture input, a gaze input, and/or a button press input, such as a pinch air gesture input detected while a gaze input is directed to the menu affordance), and in response to detecting a selection of the selectable menu user interface object, the computer system displays, via the display generation component, an option menu interface (e.g., 1918) (in some embodiments, concurrently with the representation of the spatial media item (e.g., overlaying, next to, and/or near the spatial media item)), wherein displaying the option menu interface includes displaying a second spatial viewing indicator (e.g., 1918A) (e.g., a spatial viewing indicator menu option; in some embodiments, the second spatial viewing indicator includes one or more of the properties discussed herein with respect to the spatial viewing indicator (e.g., in some embodiments, in accordance with a determination that the spatial media item meets the set of one or more stability criteria, displaying the second spatial viewing indicator with the first appearance in the options menu; in some embodiments, in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, displaying the second spatial viewing indicator with the second appearance in the options menu; in some embodiments, the second spatial viewing indicator, when selected, causes the computer system to initiate providing the expanded representation of the spatial media item); e.g., the second spatial viewing indicator is hidden in the options menu when the spatial viewing indicator is not displayed). Providing the spatial viewing indicator in a separate options menu provides improved feedback on a state of the computer system, improved control of media playback, and improved ergonomics of media playback devices. For example, the options menu provides users with the option to explicitly request the spatial viewing mode even for media items that are more likely to cause physical discomfort when viewed in the spatial viewing mode while still conveying when the spatial viewing mode is not recommended and/or appropriate based on the stability characteristics (e.g., when the spatial viewing indicator is not being displayed with the representation of the spatial media otherwise).
In some embodiments, the computer system detects a request (e.g., 1942, 1952, and/or 1954) (e.g., a touch input, a gesture input, an air gesture input, a gaze input, and/or a button press input; in some embodiments, while displaying the representation of the spatial media item (e.g., using an edit function embedded in the spatial media viewer); in some embodiments, via a separate editing application, service, or user interface) to edit the spatial media item (in some embodiments, the request to edit the spatial media item includes one or more inputs for adding visual effects, applying post-processing effects (e.g., automatic video stabilization, color correction, smoothing, and/or other post-processing techniques), cutting, splicing, cropping, reordering, slowing down, speeding up, and/or other media editing actions) (in some embodiments, while editing the spatial media item, providing an indication of portions of the spatial media item that do not meet the stability criteria (e.g., flagging and/or highlighting high-motion portions of the spatial media item for cutting/cropping)), and in response to detecting the request to edit the spatial media item, the computer system generates a modified version of the spatial media item (e.g., 1904A) (in some embodiments, saving, outputting, and/or compiling the edited spatial media item; in some embodiments, as a new spatial media item (e.g., a new file or object); in some embodiments, updating the original spatial media item itself). In some embodiments, the computer system, while displaying (in some embodiments, in a media viewing user interface (e.g., a gallery and/or library UI); in some embodiments, in a media playback user interface), via the display generation component, a representation (e.g., 1906) of the modified version of the spatial media item (e.g., 1904A) (in some embodiments, a thumbnail or preview of the edited media item; in some embodiments, while media playback of the edited media item is ongoing; in some embodiments, while media playback of the edited media item is not ongoing (e.g., paused/stopped)) and in accordance with a determination that the modified version of the spatial media item meets the set of one or more stability criteria (in some embodiments, even if the (unmodified) spatial media item does not meet the set of one or more stability criteria (e.g., the editing performed on the spatial media item removed, stabilized, or otherwise corrected the portions of the spatial media item that caused the spatial media item to not meet the set of one or more stability criteria)), displays the spatial viewing indicator with the first appearance concurrently with the representation of the modified version of the spatial media item (e.g., as illustrated in
In some embodiments, the request to edit the spatial media item includes a request (e.g., 1952) to remove a portion of the spatial media item (e.g., as illustrated in
In some embodiments, the computer system, while displaying, via the display generation component, the representation of the spatial media item, detects an input (e.g., 1922, 1926, and/or 1960) requesting to play the spatial media item. In some embodiments, the computer system, in response to detecting the input requesting to play the spatial media item and in accordance with a determination that the spatial media item does not meet the set of one or more stability criteria, displays a playback notification interface (e.g., 1924) (in some embodiments, while displaying the representation of the spatial media item; in some embodiments, overlaying the representation of the spatial media item), wherein the expansion notification interface indicates that the spatial media item does not meet the set of one or more stability criteria (in some embodiments, a warning indicating to the user that the stability characteristics of the spatial video media are likely to cause physical discomfort during playback) (in some embodiments, the notification interface is displayed prior to playing the spatial media item; in some embodiments, the computer system only proceeds with playing of the spatial media item if further confirmation is received from the user (e.g., by selecting a confirmation affordance of the notification interface)) (e.g., as illustrated in
In some embodiments, the computer system, while playing the spatial media item, modifies playback of the spatial media item to reduce an appearance of movement of a viewpoint corresponding to the spatial media item while the spatial media item was being captured (e.g., reducing and/or attenuating perceived/apparent camera movement present in the spatial media item; in some embodiments, automatically applying digital stabilization techniques to the playback; in some embodiments, changing the visual prominence of the video media item relative to a border region (e.g., as described with respect to
In some embodiments, the computer system, while displaying the representation of the spatial media item, detects an input (e.g., 1962) requesting to view a second spatial media item different from the spatial media item (e.g., via a touch input, a gesture input, an air gesture input, a gaze input, and/or a button press input; in some embodiments, selecting an affordance for navigating to another media item (in some embodiments, a “next” or “previous” button; in some embodiments, a thumbnail in a media carousel and/or other media library representation); in some embodiments, detecting a gesture or air gesture (e.g., a swipe forward or backwards motion) for navigating to another media item), and in response to detecting the input requesting to view the second spatial media item, the computer system ceases displaying the representation of the spatial media item and displays, via the display generation component, a representation (e.g., 1906) (in some embodiments, a thumbnail or preview of the second media item; in some embodiments, while media playback of the second media item is ongoing; in some embodiments, while media playback of the second media item is not ongoing (e.g., paused/stopped)) of the second spatial media item (e.g., 1964) (in some embodiments, while displaying the representation of the second spatial media item, in accordance with a determination that the second spatial media item meets the stability criteria, displaying the spatial viewing indicator with the first appearance concurrently with the representation of the second spatial media item, and in accordance with a determination that the second spatial media item does not meet the stability criteria, forgoing displaying the spatial viewing indicator with the first appearance concurrently with the representation of the second spatial media item) (e.g., as illustrated in
In some embodiments, the set of one or more stability criteria includes a criterion that is met when movement of a viewpoint corresponding to the spatial media item while the spatial media item was being captured (e.g., detected (e.g., by one or more motion sensors during capture of the spatial media item) and/or estimated movement of the viewpoint of the spatial media (e.g., of one or more cameras used to capture the spatial media item)) does not exceed (e.g., at any point in the playback time of the spatial media item) a threshold displacement (e.g., a maximum movement amount, e.g., as described with respect to
In some embodiments, the set of one or more stability criteria includes a criterion that is met when movement of a viewpoint corresponding to the spatial media item while the spatial media item was being captured (e.g., detected (e.g., by one or more motion sensors during capture of the spatial media item) and/or estimated movement of the viewpoint of the spatial media (e.g., of one or more cameras used to capture the spatial media item)) does not exceed (in some embodiments, at any point in the playback time of the spatial media item; in some embodiments, on average for a particular sampling period (e.g., 0.5, 1, and/or 3 seconds)) a threshold rate of change (e.g., a maximum magnitude of velocity and/or acceleration; e.g., maximum linear acceleration of 1.5 m/s2, 2 m/s2, and/or 2.5 m/s2; e.g., a maximum linear velocity of 1 m/s, 1.5 m/s, and/or 2 m/s; e.g., a maximum angular acceleration of 45°/s2, 50°/s2, and/or 55°/s2; e.g., a maximum angular velocity of 25°/s, 30°/s, and/or 35°/s; in some embodiments, a combined (e.g., normalized) magnitude of velocity and/or acceleration of any vertical translation component, any horizontal translation component, any pitch rotation component, and/or any yaw rotation component included in the viewpoint movement). (In some embodiments, for spatial media captured as described with respect to
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, 1600, 1800, and 2000 may be interchanged, substituted, and/or added between these methods. For example, the spatial media being viewed in method 2000 may be media captured using the user interfaces and indicators described with respect to methods 800, 1000, 1200, 1400, and/or 1800, and the video playback interfaces and techniques described with respect to method 1600 may be applied while viewing spatial media according to method 2000. For brevity, these details are not repeated here.
In
As mentioned above, in the depicted embodiments, computer system 2100 includes a plurality of non-spatial media capture modes. Photo capture user interface 2108 includes option 2112a that is selectable to switch to a non-spatial cinematic video capture mode (e.g., for capturing non-spatial cinematic videos); option 2112b that is selectable to switch to a non-spatial video capture mode (e.g., for capturing non-spatial videos); option 2112c that is selectable to switch to a non-spatial photo capture mode (e.g., for capturing non-spatial still images) (and which is currently selected in
As also mentioned above, in the depicted embodiments, computer system 2100 also includes at least one spatial media capture mode for capturing spatial media. However, in
At
At
In some embodiments, as illustrated in FIG. 21C1, computer system 2100 displays spatial capture icon 2112d1 within user interface 2108 to allow a user to switch to the spatial media capture mode for capturing spatial media, which is initially displayed with a first appearance (e.g., displaying a headset icon with a strikethrough line, and/or a grayed-out and/or otherwise visually deemphasized appearance) indicating that computer system 2100 is not in a spatial media (e.g., photo and/or video) capture mode. At FIG. 21C1, computer system 2100 detects user input 2130b1 (e.g., a tap input, an air gesture, or a mouse click input) corresponding to selection of spatial capture icon 2112d 1. The response of computer system 2100 to user input 2130b1 will be described below with respect to
At
At
Spatial media capture user interface 2132 includes spatial video shutter button 2134 and spatial photo shutter button 2136. Spatial video shutter button 2134 (when enabled and/or selectable) is selectable to initiate and/or stop capture of a spatial video, and spatial photo shutter button 2136 (when enabled and/or selectable) is selectable to capture a spatial photo (e.g., still image, or a live image that includes some content before and/or after a request to capture the spatial photo was received). Spatial media capture user interface 2132 includes preview region 2110, which provides a preview of visual content that is being received and/or detected via one or more cameras of computer system 2100 (and visual content that would be captured and saved as spatial media (e.g., a spatial media file) if the user selects shutter button 2134 and/or shutter button 2136).
In
As mentioned above, in some embodiments, capture of spatial media comprises concurrently capturing different visual content from two different cameras such that first visual content of the spatial media, captured with a first camera, corresponds to the viewpoint of a right eye of a user (and will be presented to the right eye of the user), and second visual content of the spatial media, captured with a second camera, corresponds to the viewpoint of a left eye of a user (and will be presented to the left eye of the user). In some embodiments, in order to correctly capture spatial media in this manner, it is beneficial for the first camera and the second camera to be horizontally aligned such that the first camera and the second camera are oriented in a manner that is consistent with the typical orientation of human eyes. As such, in some embodiments, the target orientation for computer system 2100 to capture spatial media is an orientation in which a first camera and a second camera are horizontally aligned.
At
At
At
At
At
At
At
At
At
In some embodiments, each representation 2170a-2170l is selectable to display a larger version of the corresponding media item, and/or initiate playback of the corresponding media item. However, as discussed above, in order for spatial media items to be viewed in a spatial manner (e.g., to give the illusion of depth and/or simulating parallax), different visual content must be displayed concurrently to the right and left eyes of a user. In some embodiments, computer system 2100 is not configured and/or is not able to display spatial media items in this spatial manner (e.g., because computer system 2100 has only one display and/or does not have separate displays for concurrently displaying different content to the right eye and left eye of a user). As such, in some embodiments, when spatial media is displayed and/or played on computer system 2100, only one of the two visual components of the spatial media is displayed (e.g., only the first visual component corresponding to the right eye of a user or only the second visual component corresponding to the left eye of the user). However, when spatial media is displayed on a head-mounted device or other device that is configured to display the spatial media in the spatial manner, first visual content of the spatial media item is displayed to the right eye of the user while second visual content different from the first visual content is concurrently displayed to the left eye of the user.
At
Additional descriptions regarding
In some embodiments, while the computer system (e.g., 2100) is in a spatial media capture mode (2202) (e.g.,
In some embodiments, while the computer system is in the spatial media capture mode (e.g., while spatial media capture option 2112d is selected) and while the computer system is not capturing spatial media (e.g.,
In some embodiments, while the computer system is in the spatial media capture mode (e.g.,
In some embodiments, outputting the first prompt comprises displaying, via the one or more display generation components (e.g., 2102), a first visual prompt (e.g., 2140) that prompts the user to rotate the computer system into the threshold range of orientations (e.g., a text prompt that provides text instructions to rotate the computer system and/or a graphic prompt that instructs the user to rotate the computer system). Displaying a visual prompt that prompts the user to rotate the computer system into the threshold range of orientations when the computer system is outside of the threshold range of orientations enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, outputting the first prompt comprises outputting a first audio prompt (e.g., 2142) that prompts the user to rotate the computer system into the threshold range of orientations (e.g., one or more sounds and/or one or more spoken or verbal instructions that instruct the user to rotate the computer system). Outputting an audio prompt that prompts the user to rotate the computer system into the threshold range of orientations when the computer system is outside of the threshold range of orientations enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, while the computer system (e.g., 2100) is in the spatial media capture mode (e.g., while option 2112d is selected) and while the computer system is not capturing spatial media (e.g.,
In some embodiments, the determination that the orientation of the computer system (e.g., 2100) is outside of the threshold range of orientations comprises a determination that a first axis (e.g., 2148a, 2148b, and/or 2148c) defined by a first camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras and a second camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras different from the first camera is rotated by greater than a threshold number of degrees (e.g., greater than 5 degrees, greater than 10 degrees, greater than 15 degrees, greater than 20 degrees, or greater than 45 degrees) relative to a target axis (e.g., 2146) (e.g., a target axis that corresponds to the horizon, and/or a target axis that is perpendicular to the direction of gravity). In some embodiments, the first camera and the second camera are used to capture spatial media, the first camera is used to capture the first visual component corresponding to the viewpoint of the right eye, and the second camera is used to capture the second visual component that corresponds to the viewpoint of the left eye. Horizontally aligning the first camera and the second camera allows for correct capture of spatial media, as this ensures that the first camera (which, in some embodiments, corresponds to the viewpoint of the right eye) and the second camera (which, in some embodiments, corresponds to the viewpoint of the left eye) are aligned in a manner that is consistent with the typical orientation of human eyes. Accordingly, outputting a prompt that prompts the user to rotate the computer system so that the first camera and the second camera are substantially horizontally aligned ensures that spatial media is correctly captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system comprises a first side (e.g., 2150a and/or 2150b) and a second side (e.g., 2150c and/or 2150d) that is perpendicular to the first side; the first side defines a first device axis; the second side defines a second device axis; the first side is longer than the second side; and the first axis corresponds to (e.g., is parallel to and/or aligns with) the first device axis (e.g., the target orientation for the device is when the device is oriented in a landscape orientation). Outputting a prompt that prompts the user to rotate the computer system so that the first camera and the second camera are substantially horizontally aligned ensures that spatial media is correctly captured, which enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, while the computer system is in the spatial media capture mode (e.g., while option 2112d is selected) and while the computer system is not capturing spatial media (e.g.,
In some embodiments, while the computer system is in the spatial media capture mode (e.g., while option 2112d is selected) and while the computer system is not capturing spatial media: the computer system displays, via the one or more display generation components (e.g., 2102), a spatial video capture affordance (e.g., 2134) that is selectable to capture spatial video media (e.g., a video that include a first visual component corresponding to the viewpoint of the right eye and a second visual component different from the first visual component that corresponds to the viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content). While displaying the spatial video capture affordance (e.g., 2134), the computer system receives (e.g., via one or more input devices of the computer system) one or more user inputs (e.g., 2154a) (e.g., one or more touch inputs, one or more gesture inputs, one or more mechanical inputs, one or more presses of one or more buttons, and/or one or more rotations of a rotatable input mechanism) corresponding to selection of the spatial video capture affordance. In response to receiving the one or more user inputs (e.g., 2154a) corresponding to selection of the spatial video capture affordance (e.g., 2134), the computer system captures a first spatial video (e.g.,
In some embodiments, while displaying the spatial video capture affordance (e.g., 2134) (e.g., while the computer system is in the spatial media capture mode and while the computer system is not capturing spatial media), the computer system displays, via the one or more display generation components (e.g., 2102), a spatial photo capture affordance (e.g., 2136) that is selectable to capture spatial photo media (e.g., a spatial still image and/or a one or more spatial images) (e.g., one or more images that include a first visual component corresponding to the viewpoint of the right eye and a second visual component different from the first visual component that corresponds to the viewpoint of a left eye that, when viewed concurrently, create an illusion of a spatial representation of captured visual content). While displaying the spatial photo capture affordance (e.g., 2136), the computer system receives (e.g., via one or more input devices of the computer system) one or more user inputs (e.g., 2154b) (e.g., one or more touch inputs, one or more gesture inputs, one or more mechanical inputs, one or more presses of one or more buttons, and/or one or more rotations of a rotatable input mechanism) corresponding to selection of the spatial photo capture affordance. In response to receiving the one or more user inputs (e.g., 2154b) corresponding to selection of the spatial photo capture affordance (e.g., 2136), the computer system captures a first spatial photo (e.g., stores data pertaining to a first spatial photo in non-volatile memory; and/or stores (e.g., in non-volatile memory of the computer system) first data (e.g., a first still image) captured by a first camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras as the first visual component corresponding to the viewpoint of a right eye, and second data (e.g., a second still image) captured by a second camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras as the second visual component corresponding to the viewpoint of the left eye). Providing a spatial photo capture affordance that is selectable to capture spatial photo media enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, while capturing (e.g., while recording) a second spatial video (e.g.,
In some embodiments, while the computer system is in the spatial media capture mode (e.g., while option 2112d is selected) (e.g., while the computer system is not capturing spatial media, while the computer system is capturing spatial media, and/or regardless of whether the computer system is capturing spatial media): in accordance with a determination that first error criteria corresponding to a first error are met, wherein the first error pertains to capture of spatial media (e.g., the first error will result in erroneous capture of spatial media and/or poor quality capture of spatial media), the computer system outputs an error correction prompt (e.g., 2156, 2158, and/or 2160) (e.g., a visual prompt, an audio prompt, and/or a haptic prompt) that prompts the user to perform one or more actions to correct the first error. In some embodiments, while the computer system is in the spatial media capture mode: in accordance with a determination that the first error criteria are not met, the computer system forgoes outputting the error correction prompt (e.g.,
In some embodiments, the determination that first error criteria are met comprises a determination that less than a threshold amount of light is detected by the computer system (e.g.,
In some embodiments, the determination that first error criteria are met comprises a determination that at least a portion of the computer system (e.g., at least a first respective camera of the one or more cameras) is less than a threshold distance (e.g., less than six inches, less than one foot, less than two feet, less than three feet, less than five feet, or less than eight feet) from one or more detected subjects (e.g., one or more detected objects) being captured by at least some of the one or more cameras (e.g.,
In some embodiments, the computer system (e.g., 2100) is a head-mounted device (e.g., 101, 1-100, 1-200, 1-302, 1-406, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, and/or X700) that is configured to be worn on the head of a user (e.g., a head-mounted device that includes a first display generation component (and/or a first set of display generation components) corresponding to and/or that is configured to be displayed to a right eye of the user (e.g., that is configured to display the first visual component of spatial media without displaying the second visual component of spatial media); and a second display generation component (and/or a second set of display generation components) corresponding to and/or that is configured to be displayed to the left eye of the user (e.g., that is configured to display the second visual component of spatial media without displaying the first visual component of spatial media)). Allowing a user to capture spatial media on a head-mounted device enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system (e.g., 2100) is not a head-mounted device (e.g., the computer system is a stand-alone digital camera, a smartphone, tablet, or other multifunction electronic device that is not designed to be worn on a head of a user). In some embodiments, the computer system does not comprise separate display generation components for the left eye of the user and the right eye of the user. As such, in some embodiments, the computer system is not able to separately display the first and second visual components of spatial media to the left and right eyes of the user in order to create the illusion of spatial representation of the spatial media. In some embodiments, the computer system is able to capture spatial media with separate visual components for the left and right eyes of the user so that the spatial media can be viewed in this manner on a different device (e.g., a head-mounted device) that includes separate display generation components for the left and right eyes of a user. Allowing a user to capture spatial media on a non-head-mounted device (e.g., for display at a later time on a head-mounted device) enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, and/or 2300 may be interchanged, substituted, and/or added between these methods. For example, the spatial media being viewed in method 2000 may be media captured using the user interfaces and indicators described with respect to methods 800, 1000, 1200, 1400, 1800, 2200, and/or 2300, and the video playback interfaces and techniques described with respect to method 1600 may be applied while viewing spatial media according to method 2000. For brevity, these details are not repeated here.
In some embodiments, the computer system (e.g., 2100) displays (2302), via the one or more display generation components (e.g., 2102), a first user interface (e.g., 2108, 2109, and/or 2132) corresponding to a camera application of the computer system (e.g., a camera application that is installed on and/or that is being run on the computer system) (e.g., a first user interface that corresponds to capture and/or recording of visual media using at least some of the one or more cameras; and/or a first user interface that displays a preview (e.g., 2110) of visual content that is within a viewpoint of at least some of the one or more cameras). While displaying the first user interface (e.g., 2108, 2109, and/or 2132) corresponding to the camera application of the computer system (2304): in accordance with a determination that the computer system is associated with a head-mounted device (e.g., 101, 1-100, 1-200, 1-302, 1-406, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, and/or X700) separate from the computer system (e.g., 2100) (2306) (e.g., the computer system and the head-mounted device are logged into the same user account, the computer system and the head-mounted device correspond to the same user, and/or the computer system and the head-mounted device correspond to the same user account), the computer system provides (2308) (e.g., displaying (e.g., within the first user interface), providing access to, and/or otherwise making available for selection) a spatial media capture mode option (e.g., 2112d) corresponding to a spatial media capture mode (in some embodiments, a spatial media capture mode of a plurality of different media capture modes (e.g., 2112a-2112e) (e.g., a plurality of different media capture modes includes a plurality of non-spatial media capture modes (e.g., a non-spatial still image capture mode, a non-spatial video capture mode, a non-spatial panoramic still image capture mode, a non-spatial time-lapse video capture mode, and/or a non-spatial slow motion video capture mode)) for capturing spatial media (e.g., corresponds to capturing and/or recording spatial media) that includes a first visual component corresponding to a viewpoint of a right eye (e.g., a first still image component that corresponds to an image from a viewpoint of the right eye and/or a first video component that corresponds to a sequence of images from a viewpoint of the right eye) and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye (e.g., a second still image component that corresponds to an image from a viewpoint of the left eye and/or a second video component that corresponds to a sequence of images from a viewpoint of the left eye) that, when viewed concurrently, create an illusion of a spatial representation of captured visual content (e.g., concurrently viewing the first visual component and the second visual component creates an illusion of a three-dimensional representation of the media; e.g., viewing different images with the left and right eye creates the illusion of depth by simulating parallax of the image contents) (in some embodiments, the first visual component is captured by a first camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras, and the second visual component is captured by a second camera (e.g., 2146a, 2146b, and/or 2146c) of the one or more cameras different from the first camera; in some embodiments, the first visual component is captured by the first camera while the second visual component is concurrently captured by the second camera); and in accordance with a determination that the computer system (e.g., 2100) is not associated with a head-mounted device separate from the computer system (2310) (e.g., the computer system is not logged into the same user account as a head-mounted device, and/or a user and/or user account that corresponds to the computer system is not associated with and/or does not correspond to a head-mounted device), the computer system forgoes providing (231) (e.g., forgoing displaying, forgoing providing access to, and/or forgoing making available) the spatial media capture mode option (e.g., 2112d) (e.g.,
In some embodiments, while displaying the first user interface (e.g., 2108 and/or 2109) corresponding to the camera application of the computer system, the computer system receives (e.g., via one or more input devices of the computer system) one or more user inputs (e.g., 2129a, 2129b, 2130a, and/or 2130b) (e.g., one or more touch inputs, one or more gesture inputs, one or more button presses, and/or one or more rotations of a rotatable input mechanism) corresponding to a user request to change a current media capture mode (e.g., a user request to transition from the current media capture mode to a different media capture mode) (e.g., a spatial media capture mode; a non-spatial media capture mode; a non-spatial still image capture mode, a non-spatial video capture mode, a non-spatial panoramic still image capture mode, a non-spatial time-lapse video capture mode, and/or a non-spatial slow motion video capture mode). In response to receiving the one or more user inputs corresponding to the user request to change the current media capture mode: in accordance with a determination that the spatial media capture mode option is enabled (e.g.,
In response to receiving the one or more user inputs (e.g., 2129a, 2129b, 2130a, and/or 2130b) corresponding to the user request to change the current media capture mode: in accordance with a determination that the spatial media capture mode option is not enabled (e.g.,
In some embodiments, the one or more user inputs corresponding to the user request to change the current media capture mode includes one or more tap inputs (e.g., 2129b and/or 2130b) (e.g., one or more tap inputs on a touch-sensitive surface; and/or one or more tap inputs on a spatial media capture mode affordance). In some embodiments, the one or more user inputs corresponding to the user request to change the current media capture mode includes one or more air gestures (e.g., one or more air tap gestures). In some embodiments, an air gesture is a gesture that is detected without the user touching (or independently of) an input element that is part of a device and is based on detected motion of a portion (e.g., the head, one or more arms, one or more hands, one or more fingers, and/or one or more legs) of the user's body through the air including motion of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), relative to another portion of the user's body (e.g., movement of a hand of the user relative to a shoulder of the user, movement of one hand of the user relative to another hand of the user, and/or movement of a finger of the user relative to another finger or portion of a hand of the user), and/or absolute motion of a portion of the user's body (e.g., a tap gesture that includes movement of a hand in a predetermined pose by a predetermined amount and/or speed, or a shake gesture that includes a predetermined speed or amount of rotation of a portion of the user's body). Allowing a user to transition between different media capture modes with a tap input enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the one or more user inputs corresponding to the user request to change the current media capture mode includes one or more swipe inputs (e.g., 2129a and/or 2130a) (e.g., one or more touch inputs on a touch-sensitive surface that include movement in at least a first direction (e.g., up, down, left, and/or right)). In some embodiments, the one or more user inputs corresponding to the user request to change the current media capture mode includes one or more air gestures (e.g., one or more air swipe gestures (e.g., one or more movements of the user's hand and/or finger(s) that include movement in at least a first direction)). Allowing a user to transition between different media capture modes with a swipe input enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the camera application includes a plurality of media capture modes (e.g., 2112a, 2112b, 2112c, 2112d, 2112e, and/or 2112f), including the spatial media capture mode (e.g., 2112d) and a non-spatial still image capture mode (e.g., 2112c) (e.g., a mode in which the computer system captures and/or is configured to capture non-spatial still images); the plurality of media capture modes are arranged in a defined order, and the spatial media capture mode is adjacent to (e.g., immediately adjacent to) the non-spatial still image capture mode within the defined order (e.g., option 2112d is adjacent to option 2112c). Allowing a user to transition between different media capture modes with a swipe input enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the determination that the computer system (e.g., 2100) is associated with a head-mounted device (e.g., 101, 1-100, 1-200, 1-302, 1-406, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, and/or X700) comprises a determination that the computer system and the head-mounted device are both associated with a same user account (e.g., the computer system and the head-mounted device are logged into the same user account). In some embodiments, the determination that the computer system is not associated with a head-mounted device comprises a determination that the computer system is not associated with the same user account as a head-mounted device (e.g., as any head-mounted device). In some embodiments, the determination that the computer system is not associated with a head-mounted device comprises a determination that the computer system is associated with a first user account, and the first user account is not associated with a head-mounted device (e.g., is not associated with any head-mounted device) (e.g., no head-mounted device is logged into the first user account). Providing the spatial media capture mode option when the computer system is associated with a head-mounted device, and forgoing providing the spatial media capture mode option when the computer system is not associated with a head-mounted device, enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, while the computer system (e.g., 2100) is not associated with a head-mounted device (e.g., 101, 1-100, 1-200, 1-302, 1-406, 3-100, 6-100, 6-200, 6-300, 6-400, 11.1.2-100, and/or X700) separate from the computer system: the computer system receives (e.g., via one or more input devices of the computer system) one or more user inputs (e.g., 2128) (e.g., one or more touch inputs, one or more gesture inputs, one or more button presses, and/or one or more rotations of a rotatable input mechanism) corresponding to a user request to enable the spatial media capture mode option. In response to receiving the one or more user inputs corresponding to the user request to enable the spatial media capture mode option, the computer system enables the spatial media capture mode option (e.g., making the spatial media capture mode option available to the user within the first user interface and/or within the camera application) (e.g., transitioning the spatial media capture mode option from a disabled state to an enabled state) (e.g.,
In some embodiments, the camera application includes a plurality of media capture modes (e.g., 2112a-2112f), including the spatial media capture mode (e.g., 2112d) and a first media capture mode (e.g., 2112a, 2112b, 2112c, 2112e, and/or 2112f) different from the spatial media capture mode (e.g., a non-spatial still image capture mode, a non-spatial video capture mode, a non-spatial panoramic still image capture mode, a non-spatial time-lapse video capture mode, and/or a non-spatial slow motion video capture mode); and the first user interface (e.g., 2108) corresponds to the first media capture mode (e.g., 2112c and/or the non-spatial photo capture mode) (e.g., the first user interface is indicative of the computer system being in the first media capture mode; and/or the computer system is configured to capture media items corresponding to the first media capture mode while displaying the first user interface). While displaying the first user interface (e.g., 2108) that corresponds to the first media capture mode, and while the computer system is in the first media capture mode, the computer system receives (e.g., via one or more input devices of the computer system) one or more user inputs (e.g., 2130a and/or 2130b) (e.g., one or more touch inputs, one or more gesture inputs, one or more button presses, and/or one or more rotations of a rotatable input mechanism) corresponding to a user request to transition from the first media capture mode to the spatial media capture mode. In response to receiving the one or more user inputs corresponding to the user request to transition from the first media capture mode to the spatial media capture mode: the computer system transitions from the first media capture mode to the spatial media capture mode (e.g.,
In some embodiments, the camera application includes a plurality of media capture modes (e.g., 2112a-2112f), including the spatial media capture mode and a first media capture mode different from the spatial media capture mode (e.g., a non-spatial still image capture mode, a non-spatial video capture mode, a non-spatial panoramic still image capture mode, a non-spatial time-lapse video capture mode, and/or a non-spatial slow motion video capture mode); while the computer system is in the first media capture mode (e.g.,
In some embodiments, the one or more camera options includes a camera flip option (e.g., 2116) for switching from a first set of cameras (e.g., one or more cameras) (e.g., a front facing camera and/or a rear facing camera) (e.g., 2106; and/or 2146a-2146c) of the one or more cameras to a second set of cameras (e.g., one or more cameras) (e.g., 2106; and/or 2146a-2146c) (e.g., a front facing camera and/or a rear facing camera) of the one or more cameras, wherein the second set of cameras is different from the first set of cameras. In some embodiments, the first media capture mode corresponds to the first user interface (e.g., 2108 and/or 2109) (e.g., the first user interface is indicative of the computer system being in the first media capture mode); and the spatial media capture mode corresponds to a spatial media capture user interface (e.g., 2132) different from the first user interface. In some embodiments, the first user interface (e.g., 2108 and/or 2109) includes a camera flip affordance (e.g., 2116) that is selectable to switch from a first set of cameras (e.g., one or more cameras) (e.g., 2106; and/or 2146a-2146c) (e.g., a front facing camera and/or a rear facing camera) of the one or more cameras to a second set of cameras (e.g., one or more cameras) (e.g., 2106; and/or 2146a-2146c) (e.g., a front facing camera and/or a rear facing camera) of the one or more cameras. In some embodiments, the spatial media capture user interface (e.g., 2132) does not include the camera flip affordance (e.g., 2116) (e.g., in some embodiments, a user is prohibited from and/or prevented from switching between different sets of cameras). Disabling one or more camera options when the computer system is in the spatial media capture mode enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the one or more camera options includes a flash option (e.g., 2118a) for selectively enabling and/or disabling a flash feature. In some embodiments, the first media capture mode corresponds to the first user interface (e.g., 2108 and/or 2109) (e.g., the first user interface is indicative of the computer system being in the first media capture mode); and the spatial media capture mode corresponds to a spatial media capture user interface (e.g., 2132) different from the first user interface. In some embodiments, the first user interface (e.g., 2108 and/or 2109) includes a flash affordance (e.g., 2118a) that is selectable to selectively enable and/or disable a flash feature. In some embodiments, when the flash feature is enabled, the computer system and/or one or more camera systems of the computer system emit light and/or a flash when capturing a media item (e.g., a photo and/or video (e.g., spatial and/or non-spatial)); and when the flash feature is disabled, the computer system and/or the one or more camera systems do not emit light and/or do not emit a flash when capturing a media item. In some embodiments, the spatial media capture user interface (e.g., 2132) does not include the flash affordance (e.g., 2118a) (e.g., in some embodiments, a user is prohibited from and/or prevented from enabling and/or disabling the flash feature). Disabling one or more camera options when the computer system is in the spatial media capture mode enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the one or more camera options includes a frame rate option (e.g., 2123b) for modifying a media capture frame rate. In some embodiments, the first media capture mode corresponds to the first user interface (e.g., 2109) (e.g., the first user interface is indicative of the computer system being in the first media capture mode); and the spatial media capture mode corresponds to a spatial media capture user interface (e.g., 2132) different from the first user interface. In some embodiments, the first user interface includes a frame rate affordance (e.g., 2123b) that is selectable to modify a media capture frame rate (e.g., switch between a plurality of frame rate options (e.g., 24 frames per second, 30 frames per second, 48 frames per second, 50 frames per second, 60 frames per second, and/or 120 frames per second)). In some embodiments, the spatial media capture user interface (e.g., 2132) does not include the frame rate affordance (e.g., in some embodiments, a user is prohibited from and/or prevented from modifying the media capture frame rate). Disabling one or more camera options when the computer system is in the spatial media capture mode enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the one or more camera options includes a zoom option (e.g., 2122, 2122a, 2122b, and/or 2122c) for modifying a media capture zoom level. In some embodiments, the first media capture mode corresponds to the first user interface (e.g., 2108 and/or 2109) (e.g., the first user interface is indicative of the computer system being in the first media capture mode); and the spatial media capture mode corresponds to a spatial media capture user interface (e.g., 2132) different from the first user interface. In some embodiments, the first user interface includes a zoom affordance (e.g., 2122, 2122a, 2122b, and/or 2122c) that is selectable to modify a media capture zoom level (e.g., in various embodiments, modifying a digital zoom level, modifying an optical zoom level, and/or switching between a plurality of different cameras that have different zoom levels). In some embodiments, the spatial media capture user interface (e.g., 2132) does not include the zoom affordance (e.g., in some embodiments, a user is prohibited from and/or prevented from modifying the media capture zoom level) (e.g., in some embodiments, user interface 2132 does not includes zoom level affordance(s) 2122, 2122a, 2122b, and/or 2122c). Disabling one or more camera options when the computer system is in the spatial media capture mode enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, the computer system displays, via the one or more display generation components (e.g., 2102), a media library user interface (e.g., 2166 and/or 2178) that includes representations of a plurality of media items (e.g., 2170a-2170l) (e.g., a plurality of media items that are stored on the computer system; a plurality of media items that correspond to the computer system; and/or a plurality of media items that correspond to a user account associated with the computer system), including displaying, within the media library user interface, a representation of a first respective media item, wherein: in accordance with a determination that the first respective media item is spatial media (e.g., 2170a, 2170e, 2170g, and/or 2170l) (e.g., that includes a first visual component corresponding to a viewpoint of a right eye (e.g., a first still image component that corresponds to an image from a viewpoint of the right eye and/or a first video component that corresponds to a sequence of images from a viewpoint of the right eye) and a second visual component different from the first visual component and that corresponds to a viewpoint of a left eye (e.g., a second still image component that corresponds to an image from a viewpoint of the left eye and/or a second video component that corresponds to a sequence of images from a viewpoint of the left eye) that, when viewed concurrently, create an illusion of a spatial representation of captured visual content (e.g., concurrently viewing the first visual component and the second visual component creates an illusion of a three-dimensional representation of the media; e.g., viewing different images with the left and right eye creates the illusion of depth by simulating parallax of the image contents) (in some embodiments, the first visual component is captured by a first camera of the one or more cameras, and the second visual component is captured by a second camera of the one or more cameras different from the first camera; in some embodiments, the first visual component is captured by the first camera while the second visual component is concurrently captured by the second camera)), the representation of the first respective media item is displayed in a first manner (e.g., the representation of the first respective media item is displayed with a first set of visual characteristics and/or displayed at a first display location); and in accordance with a determination that the first respective media item is not spatial media (e.g., 2170b, 2170c, 2170d, 2170f, 2170h, 2170i, 2170j, and/or 2170k) (e.g., is non-spatial media; is a media item that does not include a first visual component corresponding to a viewpoint of a right eye and a second visual component corresponding to a viewpoint of a left eye; is a media item that does not include a first visual component that is to be displayed to the right eye of a user while a second visual component different from the first visual component is concurrently displayed to the left eye of the user; and/or is a media item that was captured by a single camera), the representation of the first respective media item is displayed in a second manner different from the first manner (e.g., the representation of the first respective media item is displayed with a second set of visual characteristics and/or displayed at a second display location). In some embodiments, displaying the media library user interface (e.g., 2166) comprises concurrently displaying representations of one or more spatial media items displayed in the first manner and representations of one or more non-spatial media items displayed in the second manner. Visually distinguishing spatial media items and non-spatial media items within a media library enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs, and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the system more quickly and efficiently.
In some embodiments, displaying the representation of the first respective media item in the first manner includes displaying the representation of the first respective media item with a first visual badge (e.g., 2174a, 2174b, 2174g and/or 2174l) (e.g., an icon, graphic, and/or visual indicator) indicative of the first respective media item being spatial media; and displaying the representation of the first respective media item in the second manner includes displaying the representation of the first respective media item without the first visual badge (e.g., representations 2170b, 2170c, 2170d, 2170f, 2170h, 2170i, 2170j, and 2170k displayed without badges in
In some embodiments, displaying the representation of the first respective media item in the first manner includes displaying the representation of the first respective media item within a first media collection (e.g., a first storage location and/or a first folder) (e.g., 2180i) indicative of the first respective media item being spatial media, where the first media collection does not include non-spatial media items; and displaying the representation of the first respective media item in the second manner includes displaying the representation of the first respective media item in a second media collection (e.g., 2180a-2180h) (e.g., a second storage location and/or a second folder) different from the first media collection that includes non-spatial media items that are not included in the first media-collection (e.g., a second storage location indicative of the first respective media item not being spatial media and/or a second media storage location that includes both spatial media items and non-spatial media items). In some embodiments, the second media collection includes one or more non-spatial media items and does not include spatial media items. In some embodiments, the second media collection includes one or more non-spatial media items and one or more spatial media items. Visually distinguishing spatial media items and non-spatial media items within a media library enhances the operability of the system and makes the user-system interface more efficient (e.g., by preventing erroneous inputs. and helping the user to provide proper inputs and reducing errors) which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the systcm more quickly and efficiently.
In some embodiments, aspects/operations of methods 800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, and/or 2300 may be interchanged, substituted, and/or added between these methods. For example, the spatial media being viewed in method 2000 may be media captured using the user interfaces and indicators described with respect to methods 800, 1000, 1200, 1400, 1800, 2200, and/or 2300, and the video playback interfaces and techniques described with respect to method 1600 may be applied while viewing spatial media according to method 2000. For brevity, these details are not repeated here.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best use the invention and various described embodiments with various modifications as are suited to the particular use contemplated.
As described above, one aspect of the present technology is the gathering and use of data available from various sources to improve XR experiences of users. The present disclosure contemplates that in some instances, this gathered data may include personal information data that uniquely identifies or can be used to contact or locate a specific person. Such personal information data can include demographic data, location-based data, telephone numbers, email addresses, twitter IDs, home addresses, data or records relating to a user's health or level of fitness (e.g., vital signs measurements, medication information, exercise information), date of birth, or any other identifying or personal information.
The present disclosure recognizes that the use of such personal information data, in the present technology, can be used to the benefit of users. For example, the personal information data can be used to improve an XR experience of a user. Further, other uses for personal information data that benefit the user are also contemplated by the present disclosure. For instance, health and fitness data may be used to provide insights into a user's general wellness, or may be used as positive feedback to individuals using technology to pursue wellness goals.
The present disclosure contemplates that the entities responsible for the collection, analysis, disclosure, transfer, storage, or other use of such personal information data will comply with well-established privacy policies and/or privacy practices. In particular, such entities should implement and consistently use privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining personal information data private and secure. Such policies should be easily accessible by users, and should be updated as the collection and/or use of data changes. Personal information from users should be collected for legitimate and reasonable uses of the entity and not shared or sold outside of those legitimate uses. Further, such collection/sharing should occur after receiving the informed consent of the users. Additionally, such entities should consider taking any needed steps for safeguarding and securing access to such personal information data and ensuring that others with access to the personal information data adhere to their privacy policies and procedures. Further, such entities can subject themselves to evaluation by third parties to certify their adherence to widely accepted privacy policies and practices. In addition, policies and practices should be adapted for the particular types of personal information data being collected and/or accessed and adapted to applicable laws and standards, including jurisdiction-specific considerations. For instance, in the US, collection of or access to certain health data may be governed by federal and/or state laws, such as the Health Insurance Portability and Accountability Act (HIPAA); whereas health data in other countries may be subject to other regulations and policies and should be handled accordingly. Hence different privacy practices should be maintained for different personal data types in each country.
Despite the foregoing, the present disclosure also contemplates embodiments in which users selectively block the use of, or access to, personal information data. That is, the present disclosure contemplates that hardware and/or software elements can be provided to prevent or block access to such personal information data. For example, in the case of XR experiences, the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services or anytime thereafter. In another example, users can select not to provide data for customization of services. In yet another example, users can select to limit the length of time data is maintained or entirely prohibit the development of a customized service. In addition to providing “opt in” and “opt out” options, the present disclosure contemplates providing notifications relating to the access or use of personal information. For instance, a user may be notified upon downloading an app that their personal information data will be accessed and then reminded again just before personal information data is accessed by the app.
Moreover, it is the intent of the present disclosure that personal information data should be managed and handled in a way to minimize risks of unintentional or unauthorized access or use. Risk can be minimized by limiting the collection of data and deleting data once it is no longer needed. In addition, and when applicable, including in certain health related applications, data de-identification can be used to protect a user's privacy. De-identification may be facilitated, when appropriate, by removing specific identifiers (e.g., date of birth), controlling the amount or specificity of data stored (e.g., collecting location data a city level rather than at an address level), controlling how data is stored (e.g., aggregating data across users), and/or other methods.
Therefore, although the present disclosure broadly covers use of personal information data to implement one or more various disclosed embodiments, the present disclosure also contemplates that the various embodiments can also be implemented without the need for accessing such personal information data. That is, the various embodiments of the present technology are not rendered inoperable due to the lack of all or a portion of such personal information data. For example, an XR experience can generated by inferring preferences based on non-personal information data or a bare minimum amount of personal information, such as the content being requested by the device associated with a user, other non-personal information available to the service, or publicly available information.
This application claims priority to U.S. Provisional Patent Application Ser. No. 63/453,708, entitled “DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING MEDIA WITH A CAMERA APPLICATION,” filed on Mar. 21, 2023, and claims priority to U.S. Provisional Patent Application Ser. No. 63/470,878, entitled “DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING MEDIA WITH A CAMERA APPLICATION,” filed on Jun. 3, 2023, and claims priority to U.S. Provisional Patent Application Ser. No. 63/528,409, entitled “DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING MEDIA WITH A CAMERA APPLICATION,” filed on Jul. 23, 2023, and claims priority to U.S. Provisional Patent Application Ser. No. 63/537,801, entitled “DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING MEDIA WITH A CAMERA APPLICATION,” filed on Sep. 11, 2023, and claims priority to U.S. Provisional Patent Application Ser. No. 63/548,166, entitled “DEVICES, METHODS, AND GRAPHICAL USER INTERFACES FOR CAPTURING MEDIA WITH A CAMERA APPLICATION,” filed on Nov. 10, 2023. The contents of each application are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
63548166 | Nov 2023 | US | |
63537801 | Sep 2023 | US | |
63528409 | Jul 2023 | US | |
63470878 | Jun 2023 | US | |
63453708 | Mar 2023 | US |