The present disclosure generally relates to electronic devices that use sensors to provide views during communication sessions, including views that include representations of one or more of the environments of the electronic devices participating in the sessions.
Various techniques are used to present communication sessions such as video conferences, interactive gaming sessions, and other interactive social experiences. For example, the participants may see realistic or unrealistic representations of the users (e.g., avatars) participating in the sessions. However, there is a need to provide a representation of at least a portion of a sender's environment (e.g., background) to give some context of where the sender is calling from.
Various implementations disclosed herein include devices, systems, and methods that provide a view of a three-dimensional (3D) environment (e.g., a viewer's room) with a portal providing views of a representation of another user's environment (e.g., a sender's room) and a representation of the user (e.g., an avatar). Rather than providing fully immersive views, the representation may be displayed at a relatively small viewing portal at a fixed position within a larger 3D environment (e.g., a portal). Providing at least a portion of a representation of a sender's background within a portal may be intended to give some context of where the sender is calling from. However, head mounted devices (HMDs) may be limited in updating a view of a user and a user's background because external facing camera's may not be able to capture the background environment during communication session from the perspective of a typical video chat session (e.g., having the camera positioned at 1 to 2 meters in front of the user to capture both the user and the live background data). Thus, when both users are wearing HMDs during a communication session, if there is a desire to provide the actual (“live”) background, or at least a representation of the background during the session, then the system may utilize previously captured images of the environment, or at least a portion of the environment before the communication session, or the system may hallucinate any gaps. Then based on a sender's position with respect to the background, and/or a viewer's viewpoint position, the background may be displayed and updated accordingly.
In some implementations, the background data may be provided by the sender's device capturing sensor data of his or her environment, potentially filling in data gaps (e.g., hallucinating content), and may be provided to the viewer's device using parameters (e.g., blurring, not depicting other people, providing a limited (e.g., 180° FOV), using updating criteria based on changes/new content, etc.). The processing of the background data to determine the portal content (e.g., blurring, not depicting other people, providing a limited (e.g., 180° FOV), using updating criteria based on changes/new content, etc.) may be performed at the sender's device, the viewer's device, or a combination thereof. For example, the sender's device may limit the amount of portal content sent to the viewer's device such that the content may be blurred or may provide a limited view of the background.
In some implementations, the portal may provide multi-directional views (e.g., viewpoint dependent) of the other environment that changes as the viewer moves relative to the portal. The portal may present a portal view of received 180° stereo image/video background content on a plane/surface displayed in a 3D space (e.g., VR or MR). In some implementations, during capture of the sender's background, the sender's device may update a low frequency screenshot (RGB image), provide a current depth map, and optionally include additional metadata such as head orientation/pose. The sender's device and/or the viewer's device may be able to fill gaps/holes in the background (e.g., occlusions during a room scan, portions of the room were not scanned, etc.), and periodically provide updates to the viewer. For example, as the sender moves about his or her environment and provides additional views for the sender's electronic device to capture additional sensor data, the background data may be updated for the portal content.
In some implementations, specific features of the portal content may be limited in the amount of data provided, e.g., providing a viewpoint dependent view, privacy features to blur portions of the background, masking out people or other motion objects in the background, and the like. Thus, user privacy may be preserved by only providing some of the user background information, e.g., blurring portions of, or all of, the background environment, not depicting other people or other objects in the background, providing a limited view (e.g., 180° FOV), using updating criteria based on changes and/or new content in the background environment, and the like.
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods, at a first electronic device including one or more processors, that include the actions of presenting a view of a first 3D environment, obtaining data representing a second 3D environment, the data representing the second 3D environment based at least in part on sensor data captured in a physical environment of a second electronic device, determining portal content based on the data representing the second 3D environment and a viewpoint within the first 3D environment, and displaying, in the view of the first 3D environment, a portal with the portal content, wherein the portal content depicts a portion of the second 3D environment viewed through the portal from the viewpoint.
These and other embodiments can each optionally include one or more of the following features.
In some aspects, the portal content is based on synthesizing data representing a portion of the second 3D environment not represented in sensor data captured in the physical environment of the second electronic device.
In some aspects, the portal content is updated based on detecting a change in the second 3D environment or in the physical environment of the second electronic device.
In some aspects, obtaining the data representing the second 3D environment includes obtaining a parameter associated with the data representing the second 3D environment.
In some aspects, the parameter identifies a field of view or an orientation of the second 3D environment, and wherein determining portal content is further based on the parameter.
In some aspects, determining portal content includes blurring some of the portion of the second 3D environment based on the identified field of view or the orientation of the second 3D environment.
In some aspects, the method further includes the actions of obtaining data representing a user of the second electronic device, wherein determining the portal content is further based on the data representing the user of the second electronic device, and wherein the portal content depicts the representation of the user of the second electronic device in front of the portion of the second 3D environment. In some aspects, determining portal content includes blurring the portion of the second 3D environment behind the representation of the user of the second electronic device. In some aspects, the data representing the second 3D environment depicts less than a 360-degree view of the second 3D environment. In some aspects, the data representing the second 3D environment depicts a 360-degree view of the second 3D environment.
In some aspects, the method further includes the actions of determining a position at which to display the portal within the view of the first 3D environment based on the viewpoint. In some aspects, the method further includes the actions of changing the portal content based on changes to the viewpoint within the first 3D environment.
In some aspects, displaying, in the view of the first 3D environment, the portal with the portal content is based on determining a positional relationship of the viewpoint relative to the portal. In some aspects, a position of the portal within the first 3D environment is constant as the viewpoint changes within the first 3D environment. In some aspects, a position of the portal within the first 3D environment changes based on changes to the viewpoint within the first 3D environment.
In some aspects, the data representing the second 3D environment includes a stereoscopic image pair including eft eye content corresponding to a left eye viewpoint and right eye content corresponding to a right eye viewpoint. In some aspects, the data representing the second 3D environment includes a 180-degree stereo image. In some aspects, the data representing the second 3D environment includes two-dimensional (2D) image data and depth data.
In some aspects, determining portal content includes rendering at least a portion of the data representing the second 3D environment on at least a portion of a sphere. In some aspects, the data representing the second 3D environment includes a three-dimensional (3D) model. In some aspects, the data representing the second 3D environment is obtained during a communication session between the first electronic device and a second electronic device.
In some aspects, the sensor data captured in the physical environment of the second electronic device is obtained by one or more sensors of the second electronic device. In some aspects, the first 3D environment is an extended reality (XR) environment. In some aspects, the first electronic device or the second electronic device includes a head-mounted device (HMD).
In general, one innovative aspect of the subject matter described in this specification can be embodied in methods, at a first electronic device including one or more processors and one or more sensors, that include the actions of obtaining sensor data captured via the one or more sensors in a physical environment associated with the first electronic device, determining data representing a first three-dimensional (3D) environment, wherein the data representing the first 3D environment is generated based at least in part on the sensor data and a parameter identifying an orientation or a field of view of the first electronic device, and providing the data representing the first 3D environment to a second electronic device.
These and other embodiments can each optionally include one or more of the following features.
In some aspects, determining data representing the first 3D environment includes synthesizing data representing a portion of the first 3D environment not represented in sensor data captured in the physical environment of the first electronic device.
In some aspects, synthesizing data is performed based on detecting a change a position of the first electronic device within the first 3D environment. In some aspects, synthesizing data is performed based on identifying another portion of the first 3D environment that is not represented in the sensor data
In some aspects, the data representing the first 3D environment is updated based on detecting a change in the first 3D environment. In some aspects, the data representing the first 3D environment is updated based on detecting a change in the physical environment of the second electronic device. In some aspects, the data representing the first 3D environment is updated based on detecting that a change in a position of the first electronic device exceeds a threshold.
In some aspects, the method further includes the actions of determining, based on the data representing the first 3D environment, a first lighting condition associated with an area of the first 3D environment, and updating the data representing the first 3D environment for the area associated with the first lighting condition in the first 3D environment.
In some aspects, determining the data representing the first 3D environment includes determining a coverage of a background associated with the physical environment of the first electronic device based on the sensor data, and in response to determining that the coverage of the background captured of the physical environment is below a threshold amount, providing synthesized data as the data representing the first 3D environment.
In some aspects, a blurring effect is applied by the first electronic device to at least a portion of the data representing the first 3D environment provided to the second electronic device. In some aspects, the parameter identifying the orientation or the field of view of the first electronic device is based on determining a pose of the first electronic device. In some aspects, the method further includes the actions of providing data representing a user of the first electronic device to the second electronic device. In some aspects, the data representing the user of the first electronic device is provided with a frequency higher than the data representing the first 3D environment.
In some aspects, the second electronic device is configured to display a view of a portion of the data representing the first 3D environment within a portal within a view of a second 3D environment. In some aspects, the first electronic device and the second electronic device are operatively communicating during a communication session. In some aspects, the first electronic device or the second electronic device includes an HMD.
In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of any of the methods described herein. In accordance with some implementations, a non-transitory computer readable storage medium has stored therein instructions, which, when executed by one or more processors of a device, cause the device to perform or cause performance of any of the methods described herein. In accordance with some implementations, a device includes: one or more processors, a non-transitory memory, and means for performing or causing performance of any of the methods described herein.
So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings.
In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures.
Numerous details are described in order to provide a thorough understanding of the example implementations shown in the drawings. However, the drawings merely show some example aspects of the present disclosure and are therefore not to be considered limiting. Those of ordinary skill in the art will appreciate that other effective aspects and/or variants do not include all of the specific details described herein. Moreover, well-known systems, methods, components, devices and circuits have not been described in exhaustive detail so as not to obscure more pertinent aspects of the example implementations described herein.
In some implementations, views of an extended reality (XR) environment may be provided to one or more participants (e.g., user 102 and/or other participants not shown) via electronic devices 105 (e.g., a wearable device such as an HMD) and/or 110 (e.g., a handheld device such as a mobile device, a tablet computing device, a laptop computer, etc.). Such an XR environment may include views of a 3D environment that is generated based on camera images and/or depth camera images of the physical environment 100 as well as a representation of user 102 based on camera images and/or depth camera images of the user 102. Such an XR environment may include virtual content that is positioned at 3D locations relative to a 3D coordinate system (e.g., a 3D space) associated with the XR environment, which may correspond to a 3D coordinate system of the physical environment 100.
In some implementations, video (e.g., pass-through video depicting a physical environment) is received from an image sensor of a device (e.g., device 105 or device 110) and used to present the XR environment. In other implementations, optical see-through may be used to present the XR environment by overlaying virtual content on a view of the physical environment seen through a translucent or transparent display. In some implementations, a 3D representation of a virtual environment is aligned with a 3D coordinate system of the physical environment. A sizing of the 3D representation of the virtual environment may be generated based on, inter alia, a scale of the physical environment or a positioning of an open space, floor, wall, etc. such that the 3D representation is configured to align with corresponding features of the physical environment. In some implementations, a viewpoint within the 3D coordinate system may be determined based on a position of the electronic device within the physical environment. The viewpoint may be determined based on, inter alia, image data, depth sensor data, motion sensor data, etc., which may be retrieved via a virtual inertial odometry system (VIO), a simultaneous localization and mapping (SLAM) system, etc.
Environment representation 210 illustrates an example representation of physical environment 100 from viewpoint 214 corresponding to the perspective of the electronic devices 105/110 as depicted by location indicator 212. Environment representation 210 includes appearance and/or location/position information as indicated by object 222 (e.g., wall hanging 150), object 224 (e.g., plant 125), object 226 (e.g., desk 120). Additionally, environment representation 210 identifies the appearance and/or location of user 102, as illustrated by representation 220. In some implementations, environment representation 210 may include representations of environment 100 that were generated using scene sensor data that was previously captured (e.g., for portions of physical environment 100 behind user 102) as well as a representation of user 102 using current sensor data. In these implementations, representations for portions of physical environment 100 in front of user 102 may not be included in environment representation 210 (e.g., object 226 representing desk 120 may not be included).
As shown in
In some implementations, views of an XR environment may be provided to one or more participants (e.g., user 302 and/or other participants not shown, such as user 102) via electronic devices 305, e.g., a wearable device such as an HMD, and/or a handheld device such as a mobile device, a tablet computing device, a laptop computer, etc. (e.g., device 110). Such an XR environment may include views of a 3D environment that is generated based on camera images and/or depth camera images of the physical environment 300 as well as a representation of user 302 based on camera images and/or depth camera images of the user 302. Such an XR environment may include virtual content that is positioned at 3D locations relative to a 3D coordinate system (e.g., a 3D space) associated with the XR environment, which may correspond to a 3D coordinate system of the physical environment 300.
In some implementations, video (e.g., pass-through video depicting a physical environment) is received from an image sensor of a device (e.g., device 305) and used to present the XR environment. In other implementations, optical see-through may be used to present the XR environment by overlaying virtual content on a view of the physical environment seen through a translucent or transparent display. In some implementations, a 3D representation of a virtual environment is aligned with a 3D coordinate system of the physical environment. A sizing of the 3D representation of the virtual environment may be generated based on, inter alia, a scale of the physical environment or a positioning of an open space, floor, wall, etc. such that the 3D representation is configured to align with corresponding features of the physical environment. In some implementations, a viewpoint within the 3D coordinate system may be determined based on a position of the electronic device within the physical environment. The viewpoint may be determined based on, inter alia, image data, depth sensor data, motion sensor data, etc., which may be retrieved via a virtual inertial odometry system (VIO), a simultaneous localization and mapping (SLAM) system, etc.
The electronic device 305 provides views of the physical environment 300 that include depictions of the 3D environment 400 from a viewpoint 420 (e.g., also referred to herein as a viewer position), which in this example is determined based on the position of the electronic device 305 in the physical environment 300 (e.g., a viewpoint of the user 302, also referred to herein as the “viewer's position” or “viewer's viewpoint”). Thus, as the user 302 moves with the electronic device 305 (e.g., an HMD) relative to the physical environment 300, the viewpoint 420 corresponding the position of the electronic device 305 is moved relative to the 3D environment 400. The view of the 3D environment provided by the electronic device changes based on changes to the viewpoint 420 relative to the 3D environment 400. In some implementations, the 3D environment 400 does not include representations of the physical environment 300, for example, including only virtual content corresponding to a virtual reality environment.
In the examples of
In this example, the background portion 535 of the user interface 530 is flat. In this example, the background portion 535 includes all aspects of the user interface 530 being displayed except for the icon 532 and scroll bar 550. Displaying a background portion of a user interface of an operating system or application as a flat surface may provide various advantages. Doing so may provide an easy to understand or otherwise use a portion of an XR environment for accessing the user interface of the application. In some implementations, a shape of the user interface (e.g., portal 485) may be curved, such as a half-sphere, to provide a different view of depth of the content within the user interface as it is being presented within a view of a 3D environment. In some implementations, multiple user interfaces (e.g., corresponding to multiple, different applications) are presented sequentially and/or simultaneously within an XR environment, e.g., within one or more colliders or other such components.
In some implementations, the positions and/or orientations of such one or more user interfaces may be determined to facilitate visibility and/or use. The one or more user interfaces may be at fixed positions and orientations within the 3D environment. In such cases, user movements would not affect the position or orientation of the user interfaces within the 3D environment.
The position of the user interface within the 3D environment may be based on determining a distance of the user interface from the user (e.g., from an initial or current user position). The position and/or distance from the user may be determined based on various criteria including, but not limited to, criteria that accounts for application type, application functionality, content type, content/text size, environment type, environment size, environment complexity, environment lighting, presence of others in the environment, use of the application or content by multiple users, user preferences, user input, and numerous other factors.
In this example, a portion of the physical environment 300 of
For the viewpoint 720 of
In
In
In
In
The process flow 1100 obtains sensor data 1102 from a first physical environment (e.g., device 105 obtaining sensor data of physical environment 100). The sensor data 1102 may include image data, depth data, positional information, and the like. For example, sensors on a device (e.g., camera's, IMU, etc. on device 105, 110, 305, etc.) can capture information about the position, location, motion, pose, etc., of the head and/or body of a user and the environment.
In an example implementation, a portal content generation system for the process 1100 may include a portal content instruction set 1120 and a combined representation instruction set 1140. The portal content instruction set 1120 may include one or more modules that may then be used to analyze the sensor data 1102. The portal content instruction set 1120 may include a motion module 1122 for determining motion trajectory data from motion sensor(s) for one or more objects. The portal content instruction set 1120 may include a localization module 1124 is configured with instructions executable by a processor to obtain sensor data (e.g., RGB data, depth data, etc.) and track a location of a moving device (e.g., device 105, 305, etc.) in a 3D coordinate system using one or more techniques (e.g., track as a user moves around in a 3D environment to determine a particular viewpoint as discussed herein). The portal content instruction set 1120 may include an object detection module 1126 can analyze RGB images from a light intensity camera and/or a sparse depth map from a depth camera (e.g., time-of-flight sensor) and other sources of physical environment information (e.g., camera positioning information from a camera's SLAM system, VIO, or the like such as position sensors) to identify objects (e.g., people, pets, etc.) in the sequence of light intensity images. In some implementations, the object detection module 1126 uses machine learning for object identification. In some implementations, the machine learning model is a neural network (e.g., an artificial neural network), decision tree, support vector machine, Bayesian network, or the like. For example, the object detection module 1126 uses an object detection neural network unit to identify objects and/or an object classification neural network to classify each type of object.
The portal content instruction set 1120 may further include an occlusion module 1128 for detecting occlusions in the object model. For example, if a viewpoint changes for the viewer, and an occlusion is detected, the system may then determine to hallucinate any gaps of data that may be missing based on the detected occlusions between one or more objects. For example, an initial room scan may not acquire image data of the area behind the desk 120 of
The portal content instruction set 1120 may further include a privacy module 1130 that may be based on one or more user settings and/or default system settings that control the amount of blurring or masking particular areas of the background data to be shown to another user during a communication session. For example, based on a threshold distance setting, only a particular radial distance around the user may be displayed within the portal content (e.g., a five-foot radius), and then the remaining portion of the background data would be blurred. Additionally, all of the background data may be blurred for privacy purposes. Additionally (or alternatively), identified objects that show personal identifying information may be modified. For example, as illustrated in
The portal content instruction set 1120 may further include an environment representation module 1132 and/or a user representation module 1134 for generating data to be used for the representations of the background data and user representations as described herein.
The portal content instruction set 1120, utilizing the one or more modules, generates and sends portal content data 1136 to a combined representation instruction set 1140 that is configured to generate the combined representation 1150 (e.g., a virtual portal positioned within a view of a 3D environment, such as an XR environment). In some implementations, portal content data 1136 may include an environment representation similar or identical to environment representation 210, described above.
The combined representation instruction set 1140 may obtain sensor data 1104 from a viewer's environment (e.g., device 305 obtaining sensor data of physical environment 300). The sensor data 1104 may include image data, depth data, positional information, and the like. For example, sensors on a device (e.g., camera's, IMU, etc. on device 305) can capture information about the position, location, motion, pose, etc., of the head and/or body of a user and the environment. The combined representation instruction set 1140 may include a 3D environment representation module 1142 for generating a view of a representation of a viewer's environment (e.g., optical see-through, pass-through video, or a 3D model of the viewer's environment) based on the obtained sensor data 1104. The combined representation instruction set 1140 may further include a portal content representation module 1144 for generating a view of a portal that includes a representation of a sender's environment based on the obtained portal content data 1136. Thus, the combined representation instruction set 1140 generates the combined representation 1150 based on combining the 3D environment representation of physical environment 300 with the generated portal that includes portal content generated from the sender's physical environment 100, as illustrated in
One or more of the modules included in the portal content instruction set 1120 may be executed at a sender's device, a viewer's device, or a combination thereof. For example, the device 105 (e.g., a sender's device) obtains sensor data 1102 of the physical environment 100 (e.g., a room scan) and sends the sensor data to the device 305 (e.g., a viewer's device) to be analyzed to generate the portal content, and thus would update the portal content (e.g., avatar and background data) based on the updated sensor data from the device 305. Additionally (or alternatively), the device 105 (e.g., a sender's device) obtains sensor data 1102 of the physical environment 100 (e.g., a room scan) and also analyzes the sensor data to generate the portal content, and then sends the portal content to the device 305 to be displayed and viewed. Additionally (or alternatively), the analysis and the different decision points of when to hallucinate new content, blur out one or more features, etc., may be performed by both the sender's device and the viewer's device.
In an exemplary implementation, the method 1200 is performed at a first electronic device having a processor. In particular, the following blocks are performed at a viewer's device (e.g., an HMD), such as device 305 and provides a view of a 3D environment (e.g., a viewer's room) with a portal providing views of another user's (e.g., a sender's) background environment. The portal may provide a multi-directional view (e.g., viewpoint dependent) of the other environment that changes as the viewer moves relative to the portal. The background data is provided by the other user's device capturing sensor data of the other user's environment, potentially filling data gaps/hallucinating content, and may be provided to the viewer's device using parameters (e.g., blurring, not depicting other people, providing a limited (e.g., 180° FOV), using updating criteria based on changes/new content, etc.).
At block 1210, the method 1200 presents a view of a first three-dimensional (3D) environment. For example, as illustrated in
At block 1220, the method 1200 obtains data representing a second 3D environment based at least in part on sensor data captured in a physical environment of a second electronic device. For example, as illustrated in
In some implementations, the data representing the second 3D environment depicts a 360-degree view of the second 3D environment. In some implementations, the data representing the second 3D environment depicts less than a 360-degree view of the second 3D environment (e.g., 180-degree fOV).
In some implementations, the data representing the second 3D environment includes a stereoscopic image pair including left eye content corresponding to a left eye viewpoint and right eye content corresponding to a right eye viewpoint. For example, the data representing the second 3D environment may include a 180-degree stereo image, and/or spherical maps or equirectangular projections. Additionally, or alternatively, in some implementations, the data representing the second 3D environment includes two-dimensional (2D) image data and depth data (e.g., a 2D image and depth data/height map). In some implementations, the data representing the second 3D environment includes a 3D model (e.g., a 3D mesh representing the background environment).
In some implementations, the data representing the second 3D environment is obtained during a communication session between the first electronic device and a second electronic device. For example, as illustrated in
At block 1230, the method 1200 determines portal content based on the data representing the second 3D environment and a viewpoint within the first 3D environment (e.g., a viewer's viewpoint of the portal). For example, as illustrated in
In some implementations, the portal content is based on synthesizing data representing a portion of the second 3D environment not represented in sensor data captured in the physical environment of the second electronic device. For example, during capture of the sender's background, the sender's device may update a low frequency screenshot (e.g., an RGB image) and a depth map (and maybe some metadata such as head orientation/pose) during a communication session. Additionally, the viewer's device may be able to fill holes (e.g., update an incomplete map via an auto filling neural network) and update the background data. In some implementations, the viewer's device (e.g., device 305) may update the portal content using updating criteria based on viewpoint changes of the user, new content, the sender changing his or her position, new/different background views, detection of objects in motion, and the like.
In some implementations, the portal content is updated based on detecting a change in the second 3D environment or in the physical environment of the second electronic device. For example, the background of the portal content 485 may be updated based on the sender's device or the viewing device detecting new content in the background of the data representing the second 3D environment, the sender or viewer changing his or her position, new/different background views, and the like. In some implementations, determining portal content includes rendering at least a portion of the data representing the second 3D environment on at least a portion of a sphere.
At block 1240, the method 1200 displays a portal with the portal content in the view of the first 3D environment, where the portal content depicts a portion of the second 3D environment viewed through the portal from the viewpoint. For example, as illustrated in
In some implementations, obtaining the data representing the second 3D environment includes obtaining a parameter associated with the data representing the second 3D environment. In some implementations, the parameter identifies a field of view or an orientation of the second 3D environment, and wherein determining portal content is further based on the parameter. For example, the system may update the background data based on viewpoint changes of the viewer, as illustrated in the different viewpoint scenarios of
In some implementations, determining portal content includes blurring some of the portion of the second 3D environment based on the identified field of view or the orientation of the second 3D environment. For example, a privacy blur may be applied to a portion of the background data (e.g., blocking any personal identifying information, such as a picture or photograph of a person or family member). For example, as illustrated in
In some implementations, the method 1200 further includes obtaining data representing a user of the second electronic device, wherein determining the portal content is further based on the data representing the user of the second electronic device, and wherein the portal content depicts the representation of the user of the second electronic device in front of the portion of the second 3D environment (e.g., displaying an avatar with the background). In some implementations, determining portal content includes blurring the portion of the second 3D environment behind the representation of the user of the second electronic device (e.g., applying a slight blur to the entire background behind the avatar).
In some implementations, the method 1200 further includes determining a position at which to display the portal within the view of the first 3D environment based on the viewpoint. In some implementations, the method 1200 further includes changing the portal content based on changes to the viewpoint within the first 3D environment. For example, as illustrated in
In some implementations, displaying, in the view of the first 3D environment, the portal with the portal content is based on determining a positional relationship (e.g., distance, orientation, etc.) of the viewpoint (viewer's head or device) relative to the portal. For example, the positional relationship may be within or outside of a threshold distance from the visual content, within a sphere determined based on the visual content, and the like.
In some implementations, a position of the portal within the first 3D environment is constant as the viewpoint changes within the first 3D environment. For example, as the user (e.g., a viewer, such as user 302) moves around his or her environment (e.g., physical environment 300), the portal 480 stays in a fixed position (e.g., at the same 3D location). Alternatively, in some implementations, a position of the portal within the first 3D environment changes based on changes to the viewpoint within the first 3D environment. For example, as the user (e.g., a viewer, such as user 302) moves around his or her environment (e.g., physical environment 300), the portal 480 may move with the user. For example, the portal may appear to remain at the same viewing distance in front of the user 302, or the portal 480 may remain in same 3D location, but pivot so that it is always facing the user based on the user's viewpoint. Alternatively, in some implementations, the portal may move based on other changes in the environment such as in interruption event (e.g., another person or other object occluding the view of the portal 480).
In an exemplary implementation, the method 1300 is performed at a first electronic device having a processor and one or more sensors (e.g., outward facing sensors on the device, such as an HMD). In particular, the following blocks are performed at a sender's device (e.g., an HMD), such as device 105, and provides a sender-side perspective of the process of method 1200. For example, method 1200 provides views of a user's (e.g., sender's) environment to be viewed within a portal within views of a 3D environment (e.g., viewer's room). The portal may provide a multi-directional view (e.g., viewpoint dependent) of the sender's environment that changes as the sender changes position within his or her environment or the viewer moves relative to the portal. The data representing the user's 3D environment is provided by the user's device capturing sensor data of the user's environment, potentially filling data gaps/hallucinating content, and may be provided to the viewer's device using parameters (e.g., blurring, not depicting other people, providing a limited (e.g., 180° FOV), using updating criteria based on changes/new content, etc.). During capture of the sender's environment, the sender's device may update a low frequency screenshot (RGB image)+depth map (and metadata such as head orientation/pose), fill holes, and periodically provide updates to the viewer.
At block 1310, the method 1300 obtains sensor data captured via the one or more sensors in a physical environment associated with the first electronic device The sensor data 1102 may include image data, depth data, positional information, and the like. For example, sensors on a device (e.g., camera's, IMU, etc. on device 105 or device 110) can capture information about the position, location, motion, pose, etc., of the head and/or body of a user and the environment.
At block 1320, the method 1300 determines data representing a first 3D environment that is generated based at least in part on the sensor data and a parameter identifying an orientation or a field of view of the first electronic device. For example, as illustrated in
In some implementations, data representing a first user of the first electronic device (e.g., an avatar) may also be obtained. In some implementations, the data representing the first 3D environment may include various types of 3D representations that may include, but is not limited to, a 180° or 360° stereo image (e.g., spherical maps, equirectangular projections, etc.), a 2D image and depth data/height map, or a 3D model/mesh. In some implementations, the data representing the first 3D environment may be based on outward facing sensors on the first device/HMD and/or hallucinated content.
At block 1330, the method 1300 provides the data representing the first 3D environment to a second electronic device.
In some implementations, the data representing the first 3D environment is based on synthesizing data representing a portion of the first 3D environment not represented in sensor data captured in the physical environment of the first electronic device. For example, during capture of the sender's environment, the sender's device (e.g., device 105) may update a low frequency screenshot (e.g., an RGB image) and a depth map (and maybe some metadata such as head orientation/pose) during a communication session. Additionally, the sender's device may fill holes (e.g., update an incomplete map via an auto filling neural network) and update the data representing the first 3D environment. In some implementations, the sender's device may update the data representing the first 3D environment using updating criteria based on viewpoint changes of the user, new content, the sender changing his or her position, new/different background views, detection of objects in motion, and the like. In some implementations, the synthesized data is determined based on detecting a change in the viewpoint within the second 3D environment (e.g., hallucinating data based on viewpoint changes of the viewer). In some implementations, the synthesized data is determined based on identifying another portion of the first 3D environment that is not captured by the sensor data (e.g., hallucinating data based on detecting new background not obtained by the sensor data).
In some implementations, the portal content is updated based on detecting a change in the first 3D environment or in the physical environment of the second electronic device. For example, the data representing the first 3D environment may be updated based on the sender's device detecting new content in the environment, the sender or viewer changing his or her position, new/different background views, and the like. In some implementations, the data representing the first 3D environment is updated based on detecting a change in the physical environment of the first electronic device. For example, the sender's device may update the data representing the first 3D environment based on new content in the environment, the sender changing his or her position, new/different background views, and the like.
In some implementations, the data representing the first 3D environment is updated based on detecting that a change in a position of the first electronic device exceeds a threshold. For example, the sender's device may update the data representing the first 3D environment (and avatar) in response to the sender's electronic device moving a threshold distance (e.g., moving more than three to five meters, or another configurable threshold distance set by the system and/or the user).
In some implementations, the method 1300 further includes determining, based on the data representing the first 3D environment, a first lighting condition associated with an area of the first 3D environment, and updating the data representing the first 3D environment for the area associated with the first lighting condition in the first 3D environment. For example, as illustrated in
In some implementations, determining the data representing the first 3D environment includes determining a coverage of a background associated with the physical environment of the first electronic device based on the sensor data, and in response to determining that the coverage of the background captured of the physical environment is below a threshold amount, including synthesized background data for the portion of the first 3D environment in the data representing the first 3D environment. For example, the system may be configured to transmit a default background if coverage of the representation is below some threshold amount (e.g., if the room scan covers less than 50% of the current location and current viewpoint of the sender in the physical environment, then the system could display a default background as opposed to hallucinating content to fill in the gaps).
In some implementations, a blurring effect is applied by the first electronic device (e.g., the sender's device, device 105) to at least a portion of the data representing the first 3D environment provided to the second electronic device (e.g., the viewer's device, device 305). For example, the sender may apply a slight blur to the entire representation of the first 3D environment (e.g., in case the blurring is performed by the sender rather than the receiver/viewer).
In some implementations, the second electronic device is configured to display a view of the data representing the first 3D environment within a portal within a view of a second 3D environment (e.g., VR or XR). In some implementations, the first electronic device and the second electronic device are operatively communicating during a communication session. For example, as illustrated in
In some implementations, the one or more communication buses 1404 include circuitry that interconnects and controls communications between system components. In some implementations, the one or more I/O devices and sensors 1406 include at least one of an inertial measurement unit (IMU), an accelerometer, a magnetometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., blood pressure monitor, heart rate monitor, blood oxygen sensor, blood glucose sensor, etc.), one or more microphones, one or more speakers, a haptics engine, one or more depth sensors (e.g., a structured light, a time-of-flight, or the like), and/or the like.
In some implementations, the one or more displays 1412 are configured to present a view of a physical environment or a graphical environment to the user. In some implementations, the one or more displays 1412 correspond to holographic, digital light processing (DLP), liquid-crystal display (LCD), liquid-crystal on silicon (LCoS), organic light-emitting field-effect transitory (OLET), organic light-emitting diode (OLED), surface-conduction electron-emitter display (SED), field-emission display (FED), quantum-dot light-emitting diode (QD-LED), micro-electromechanical system (MEMS), and/or the like display types. In some implementations, the one or more displays 1412 correspond to diffractive, reflective, polarized, holographic, etc. waveguide displays. In one example, the device 10 includes a single display. In another example, the device 10 includes a display for each eye of the user.
In some implementations, the one or more image sensor systems 1414 are configured to obtain image data that corresponds to at least a portion of the physical environment 105. For example, the one or more image sensor systems 1414 include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), monochrome cameras, IR cameras, depth cameras, event-based cameras, and/or the like. In various implementations, the one or more image sensor systems 1414 further include illumination sources that emit light, such as a flash. In various implementations, the one or more image sensor systems 1414 further include an on-camera image signal processor (ISP) configured to execute a plurality of processing operations on the image data.
The memory 1420 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices. In some implementations, the memory 1420 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 1420 optionally includes one or more storage devices remotely located from the one or more processing units 1402. The memory 1420 includes a non-transitory computer readable storage medium.
In some implementations, the memory 1420 or the non-transitory computer readable storage medium of the memory 1420 stores an optional operating system 1430 and one or more instruction set(s) 1440. The operating system 1430 includes procedures for handling various basic system services and for performing hardware dependent tasks. In some implementations, the instruction set(s) 1440 include executable software defined by binary information stored in the form of electrical charge. In some implementations, the instruction set(s) 1440 are software that is executable by the one or more processing units 1302 to carry out one or more of the techniques described herein.
The instruction set(s) 1440 includes a portal content instruction set 1442 to generate portal content data, and a representation instruction set 1444 generate and display representations of a background and/or a user. The instruction set(s) 1440 may be embodied a single software executable or multiple software executables. In some implementations, the portal content instruction set 1442 and the representation instruction set 1444 are executable by the processing unit(s) 1402 using one or more of the techniques discussed herein or as otherwise may be appropriate. To these ends, in various implementations, the instruction includes instructions and/or logic therefor, and heuristics and metadata therefor.
Although the instruction set(s) 1440 are shown as residing on a single device, it should be understood that in other implementations, any combination of the elements may be located in separate computing devices. Moreover,
The housing 1501 houses a display 1510 that displays an image, emitting light towards or onto the eye 35 of a user 25. In various implementations, the display 1510 emits the light through an eyepiece having one or more optical elements 1505 that refracts the light emitted by the display 1510, making the display appear to the user 25 to be at a virtual distance farther than the actual distance from the eye to the display 1510. For example, optical element(s) 1505 may include one or more lenses, a waveguide, other diffraction optical elements (DOE), and the like. For the user 25 to be able to focus on the display 1510, in various implementations, the virtual distance is at least greater than a minimum focal distance of the eye (e.g., 7 cm). Further, in order to provide a better user experience, in various implementations, the virtual distance is greater than 1 meter.
The housing 1501 also houses a tracking system including one or more light sources 1522, camera 1524, camera 1532, camera 1534, and a controller 1580. The one or more light sources 1522 emit light onto the eye of the user 25 that reflects as a light pattern (e.g., a circle of glints) that can be detected by the camera 1524. Based on the light pattern, the controller 1580 can determine an eye tracking characteristic of the user 25. For example, the controller 1580 can determine a gaze direction and/or a blinking state (eyes open or eyes closed) of the user 25. As another example, the controller 1580 can determine a pupil center, a pupil size, or a point of regard. Thus, in various implementations, the light is emitted by the one or more light sources 1522, reflects off the eye of the user 25, and is detected by the camera 1524. In various implementations, the light from the eye of the user 25 is reflected off a hot mirror or passed through an eyepiece before reaching the camera 1524.
The display 1510 emits light in a first wavelength range and the one or more light sources 1522 emit light in a second wavelength range. Similarly, the camera 1524 detects light in the second wavelength range. In various implementations, the first wavelength range is a visible wavelength range (e.g., a wavelength range within the visible spectrum of approximately 400-700 nm) and the second wavelength range is a near-infrared wavelength range (e.g., a wavelength range within the near-infrared spectrum of approximately 700-1400 nm).
In various implementations, eye tracking (or, in particular, a determined gaze direction) is used to enable user interaction (e.g., the user 25 selects an option on the display 1510 by looking at it), provide foveated rendering (e.g., present a higher resolution in an area of the display 1510 the user 25 is looking at and a lower resolution elsewhere on the display 1510), or correct distortions (e.g., for images to be provided on the display 1510). In various implementations, the one or more light sources 1522 emit light towards the eye of the user 25 which reflects in the form of a plurality of glints.
In various implementations, the camera 1524 is a frame/shutter-based camera that, at a particular point in time or multiple points in time at a frame rate, generates an image of the eye of the user 25. Each image includes a matrix of pixel values corresponding to pixels of the image which correspond to locations of a matrix of light sensors of the camera. In implementations, each image is used to measure or track pupil dilation by measuring a change of the pixel intensities associated with one or both of a user's pupils.
In various implementations, the camera 1524 is an event camera including a plurality of light sensors (e.g., a matrix of light sensors) at a plurality of respective locations that, in response to a particular light sensor detecting a change in intensity of light, generates an event message indicating a particular location of the particular light sensor.
In various implementations, the camera 1532 and camera 1534 are frame/shutter-based cameras that, at a particular point in time or multiple points in time at a frame rate, can generate an image of the face of the user 25. For example, camera 1532 captures images of the user's face below the eyes, and camera 1534 captures images of the user's face above the eyes. The images captured by camera 1532 and camera 1534 may include light intensity images (e.g., RGB) and/or depth image data (e.g., Time-of-Flight, infrared, etc.).
It will be appreciated that the implementations described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope includes both combinations and sub combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
As described above, one aspect of the present technology is the gathering and use of sensor data that may include user data to improve a user's experience of an electronic device. The present disclosure contemplates that in some instances, this gathered data may include personal information data that uniquely identifies a specific person or can be used to identify interests, traits, or tendencies of a specific person. Such personal information data can include movement data, physiological data, demographic data, location-based data, telephone numbers, email addresses, home addresses, device characteristics of personal devices, or any other personal information.
The present disclosure recognizes that the use of such personal information data, in the present technology, can be used to the benefit of users. For example, the personal information data can be used to improve the content viewing experience. Accordingly, use of such personal information data may enable calculated control of the electronic device. Further, other uses for personal information data that benefit the user are also contemplated by the present disclosure.
The present disclosure further contemplates that the entities responsible for the collection, analysis, disclosure, transfer, storage, or other use of such personal information and/or physiological data will comply with well-established privacy policies and/or privacy practices. In particular, such entities should implement and consistently use privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining personal information data private and secure. For example, personal information from users should be collected for legitimate and reasonable uses of the entity and not shared or sold outside of those legitimate uses. Further, such collection should occur only after receiving the informed consent of the users. Additionally, such entities would take any needed steps for safeguarding and securing access to such personal information data and ensuring that others with access to the personal information data adhere to their privacy policies and procedures. Further, such entities can subject themselves to evaluation by third parties to certify their adherence to widely accepted privacy policies and practices.
Despite the foregoing, the present disclosure also contemplates implementations in which users selectively block the use of, or access to, personal information data. That is, the present disclosure contemplates that hardware or software elements can be provided to prevent or block access to such personal information data. For example, in the case of user-tailored content delivery services, the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services. In another example, users can select not to provide personal information data for targeted content delivery services. In yet another example, users can select to not provide personal information, but permit the transfer of anonymous information for the purpose of improving the functioning of the device.
Therefore, although the present disclosure broadly covers use of personal information data to implement one or more various disclosed embodiments, the present disclosure also contemplates that the various embodiments can also be implemented without the need for accessing such personal information data. That is, the various embodiments of the present technology are not rendered inoperable due to the lack of all or a portion of such personal information data. For example, content can be selected and delivered to users by inferring preferences or settings based on non-personal information data or a bare minimum amount of personal information, such as the content being requested by the device associated with a user, other non-personal information available to the content delivery services, or publicly available information.
In some embodiments, data is stored using a public/private key system that only allows the owner of the data to decrypt the stored data. In some other implementations, the data may be stored anonymously (e.g., without identifying and/or personal information about the user, such as a legal name, username, time and location data, or the like). In this way, other users, hackers, or third parties cannot determine the identity of the user associated with the stored data. In some implementations, a user may access their stored data from a user device that is different than the one used to upload the stored data. In these instances, the user may be required to provide login credentials to access their stored data.
Numerous specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, those skilled in the art will understand that the claimed subject matter may be practiced without these specific details. In other instances, methods apparatuses, or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.
Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing the terms such as “processing,” “computing,” “calculating,” “determining,” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.
The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provides a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more implementations of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.
Implementations of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.
The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or value beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.
It will also be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first node could be termed a second node, and, similarly, a second node could be termed a first node, which changing the meaning of the description, so long as all occurrences of the “first node” are renamed consistently and all occurrences of the “second node” are renamed consistently. The first node and the second node are both nodes, but they are not the same node.
The terminology used herein is for the purpose of describing particular implementations only and is not intended to be limiting of the claims. As used in the description of the implementations and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used herein, the term “if” may be construed to mean “when” or “upon” or “in response to determining” or “in accordance with a determination” or “in response to detecting,” that a stated condition precedent is true, depending on the context. Similarly, the phrase “if it is determined [that a stated condition precedent is true]” or “if [a stated condition precedent is true]” or “when [a stated condition precedent is true]” may be construed to mean “upon determining” or “in response to determining” or “in accordance with a determination” or “upon detecting” or “in response to detecting” that the stated condition precedent is true, depending on the context.
The foregoing description and summary of the invention are to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined only from the detailed description of illustrative implementations but according to the full breadth permitted by patent laws. It is to be understood that the implementations shown and described herein are only illustrative of the principles of the present invention and that various modification may be implemented by those skilled in the art without departing from the scope and spirit of the invention.
This application claims the benefit of U.S. Provisional Application Ser. No. 63/470,907 filed Jun. 4, 2023, which is incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63470907 | Jun 2023 | US |