The present disclosure relates to rendering augmented reality (AR) environments and associated AR computing servers, such as network server, and AR display devices, and related operations for displaying video objects through AR display devices.
Immersive virtual reality (VR) environments have been developed which provide VR environments for on-line conferencing in which computer generated avatars represent locations of human participants in the meetings. Example software products that provide VR environments for on-line conferencing include MeetinVR, Glue, FrameVR, Engage, BigScreen VR, Mozilla Hubs, AltSpace, Rec Room, Spatial, and Immersed. Example user devices that can display VR environments to participants include Oculus Quest VR headset, Oculus Go VR headset, and personal computers and smart phones running various VR applications.
In contrast to VR environments where human participants only see computer generated graphical renderings, human participants using augmented reality (AR) environments see a combination of computer-generated graphical renderings overlaid on a view of the physical real-world through, e.g., see-through display screens. AR environments are also referred to as mixed reality environments because participants see a blended physical and digitally rendered world. Example user devices that can display AR environments include Google Glass, Microsoft HoloLens, Vuzix, and personal computers and smart phones running various AR applications. There is a need to provide on-line conferencing capabilities in an AR environment.
Some embodiments disclosed herein are directed to an AR computing server that includes a network interface, a processor, and a memory storing instructions executable by the processor to perform operations. The network interface is configured to receive through a network a three-dimensional (3D) video stream from a user device during a conference session. The operations identify a video object captured in the 3D video stream, and determine a pose of the video object captured in the 3D video stream. The operations obtain AR context information from an AR display device indicating how the video object is to be posed relative to a physical object viewable through a see-through display of the AR display device, and adjust pose of the video object captured in the 3D video stream based on the AR context information. The operations output the video object to the see-through display for display.
In some further embodiments the operation to determine the pose of the video object captured in the 3D video stream, includes to determine pose of features of a face captured in the 3D video stream, the operation to adjust pose of the video object captured in the 3D video stream based on the AR context information includes to rotate and/or translate the features of the face captured in the 3D video stream based on comparison of the pose of the features of the face captured in the 3D video stream to the AR context information indication of how the features of the face are to be posed relative to the physical object viewable through the see-through display of the AR display device.
Some other related embodiments are directed to a corresponding method by an AR computing server. The method includes identifying a video object captured in a 3D video stream received from a user device during a conference session, and determining a pose of the video object. The method obtains AR context information from an AR display device indicating how the video object is to be posed relative to a physical object viewable through a see-through display of the AR display device, and adjusts pose of the video object captured in the 3D video stream based on the AR context information. The method outputs the video object to the see-through display for display.
Some other related embodiments are directed to a corresponding computer program product including a non-transitory computer readable medium storing instructions executable by at least one processor of an AR computing server to perform operations. The operations identify a video object captured in a 3D video stream received from a user device during a conference session, and determine a pose of the video object. The operations obtain AR context information from an AR display device indicating how the video object is to be posed relative to a physical object viewable through a see-through display of the AR display device, and adjust pose of the video object captured in the 3D video stream based on the AR context information. The operations output the video object to the see-through display for display.
Some other related embodiments are directed to a corresponding AR computing server configured to identify a video object captured in a 3D video stream received from a user device during a conference session, and determine a pose of the video object. The AR computing server is further configured to obtain AR context information from an AR display device indicating how the video object is to be posed relative to a physical object viewable through a see-through display of the AR display device, and adjust pose of the video object captured in the 3D video stream based on the AR context information. The AR computing server is further configured to output the video object to the see-through display for display.
Some potential advantages of these embodiments is they enable a human participant during a conference to view through a see-through display of an AR display device a video object, such as another participant, which is being displayed with a pose that is determined based on AR context information. The AR computing server can use various characteristics of AR context information to determine how to pose and scale an image of the video object, such as where to pose a video image of the other participant within a room.
Other AR computing servers, methods, and computer program products according to embodiments will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional AR computing servers, methods, and computer program products be included within this description and protected by the accompanying claims.
Aspects of the present disclosure are illustrated by way of example and are not limited by the accompanying drawings. In the drawings:
Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of various present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
Embodiments of the present disclosure are directed to providing on-line conferencing capabilities in an AR environment. The AR environment can enable a local participant in a conference to visually experience an immersive presence of a remote participant who's video image is posed relative to real-world physical objects that the local participant views through a see-through display of an AR display device (e.g., AR glasses worn by the local participant).
Referring to
Referring to
Referring to
Referring to
The 3D video stream is provided to the AR computing server 200 for processing via, for example, a radio access network 240 and networks 250 (e.g., private networks and/or public networks such as the Internet). The AR computing server 200 may be an edge computing server, a network computing server, a cloud computing server, etc. which communicates through the networks 250 with the user device 210 and the AR display device 220.
The AR computing server 200 includes at least one processor circuit 204 (referred to herein as “processor”), at least one memory 206 (referred to herein as “memory”), and at least one network interface 202 (referred to herein as “network interface”). Although the network interface 202 is illustrated as a wireless transceiver which communicates with a RAN 240, it may additionally or alternatively be a wired network interface, e.g., Ethernet. The processor 204 may include one or more data processing circuits, such as a general purpose and/or special purpose processor (e.g., microprocessor and/or digital signal processor), which may be collocated or distributed across the networks 250. The processor 204 is operationally connected to these various components. The memory 206, described below as a computer readable medium, stores executable instructions 208 that are executed by the processor 204 to perform operations.
Operations by the AR computing server 200 include identifying 310 a video object captured in the 3D video stream, and determining 312 a pose of the video object captured in the 3D video stream. The identification of the video object and determination of its pose may correspond to identifying presence and pose of various types of real-world physical objects in the 3D video stream. For example, the determination operation 312 may identify the pose of the face, body, and/or features of the face and/or body of the remote participant captured in the 3D video stream, such as by identifying pose of the head, eyes, lips, ears, neck, torso, arms, hands, etc. Additionally or alternatively, the determination operation 312 may identify the pose of furniture objects captured in the 3D video stream, such as a bed, seat, table, floor, etc. in the rooms illustrated in
The operations by the AR computing server 200 further include obtaining 314 AR context information from the AR display device 220 indicating how the video object is to be posed relative to a physical object viewable through a see-through display 234 of the AR display device 220. The operations adjust 316 pose of the video object captured in the 3D video stream based on the AR context information, and output 318 the video object to the see-through display 234 of the AR display device 220 for display. The AR display device 220 is configured to render 322 the video object at a location on the see-through display 234 which is determined based on the adjusted pose (operation 316).
In one embodiment, the AR context information obtained from the AR display device 220 can indicate, for example, pose of a chair, table, floor, etc. on which the video object (e.g., video image of the remote participant in
In one embodiment, the AR context information provided 320 by the AR display device 220 indicates where a user of the AR display device 220 has designated that the video object with the adjusted pose is to be displayed. For example, the user may designate a real-world physical object, such as a seat, table, bed, floor, etc., in a room where the video object is to be displayed and anchored relative to the real-world physical object. Referring to the illustrative example of
In one embodiment, the AR context information can be obtained by determining pose of the physical object in a video stream from the camera 232 of the AR display device 220. The operation by the AR computing server 200 to obtain 314 the AR context information can include to determine a pose of the see-through display 234 of the AR display device 220 relative to the physical object captured in a video stream from a camera 232 of the AR display device 220. The operation to adjust 316 pose of the video object captured in the 3D video stream can include to adjust pose of the video object captured in the 3D video stream based on comparison of the pose of the video object to the pose of the see-through display 234 of the AR display device 220 relative to the physical object captured in a video stream from the camera 232 of the AR display device 220.
In the example of
In the illustrated example, the display device 230 is part of a mobile electronic device 236 which is releasably held by a head-wearable frame 238 oriented relative to the see-through display screen 234. The display device 236 is arranged to display information that is projected on the see-through display screen 234 for reflection directly or indirectly toward the user's eyes, i.e., while wearing the frame 238. Although not shown, the frame 238 may include intervening mirrors that are positioned between the see-through display screen 234 and the user's eyes and, hence the light may be reflected directly or indirectly toward the user's eyes.
In some other embodiments, the see-through display is part of the display device 230 which operates to superimpose the adjusted pose video image received from the AR computing server 200 on a video stream of the real-world captured by the camera 232. For example, a user holding the mobile electronic device 236 can view through the display device 230 a video stream from the camera 232 of a room, e.g., including the chair and bed shown in
As used herein, the term “pose” refers to the position and/or the orientation of a video object relative to a defined coordinate system (e.g., a video frame from the 3D camera 212 or the user device 210) or may be relative to another device (e.g., the AR display device 220). A pose may therefore be defined based on only the multidimensional position of one device relative to another device or to a defined coordinate system, only on the multidimensional orientation of the device relative to another device or to a defined coordinate system, or on a combination of the multidimensional position and the multidimensional orientation.
Referring to
In a further example embodiment, the operation to determine 312 the pose of the video object captured in the 3D video stream, can include to determine pose of features of a face captured in the 3D video stream. In the example of
As explained above, the AR context information can be obtained by determining pose of the physical object in a video stream from the camera 232 of the AR display device 220. The AR computing server 200 may be configured to use a context selection rule to automatically select which physical object among a plurality of physical objects which are captured in a video stream from the camera 232 of the AR display device 220. The operation to determine 312 (
Some illustrative non-limiting examples of context selection rule operations are explained. In one embodiment, the operation by the AR computing server 200 includes to determine that one of the physical objects captured in the video stream from the camera 232 of the AR display device 220 satisfies the context selection rule based on the one of the physical objects having a shape that matches a defined shape of one of: a seat on which the video object captured in the 3D video stream is to be displayed on the see-through display 234 with a pose viewed as appearing to be supported by the seat; a table on which the video object captured in the 3D video stream is to be displayed on the see-through display 234 with a pose viewed as appearing to be supported by the table; and a floor on which the video object captured in the 3D video stream is to be displayed on the see-through display 234 with a pose viewed as appearing to be supported by the floor.
In some further operational embodiments, the AR computing server 200 operates to adjust color and/or shading of the video object in the video stream from the user device 210 based on color and/or shading of the real-world physical object being viewed by the user operating the AR display device 220 in combination with the displayed video object with the adjusted pose. In one embodiment, operation by the AR computing server 200 includes to adjust color and/or shading of the physical object which is output to the see-through display 234 for display, based on color and/or shading of the physical object captured in the video stream from the camera 232 of the AR display device 220.
As a local participant moves about a room while viewing through the AR display device 220 the video image of a remote participant posed relative to a real-world physical object, the relative positioning between the location of the local participant in the virtual location of the posed video image of the remote participant can result in substantial range of adjustments being made to the pose (e.g., rotation and translation) and scaling of size of the remote participant's body being viewed. Some poses may result in the upper torso and head of the remote participant to be viewed through the AR display device 220 while some other poses may result in only the head or portion of the head being viewed. Moreover, how much of the remote participant's body is captured in the 3D video stream from the user device 210 may change over time due to, for example, the remote participant moving relative to the camera 212 of the user device 210. To facilitate generation of any desired pose and scaling of the video image of the remote participant, some other operational embodiments of the AR computing server 200 combine a previously stored image of an extended part (e.g., part of the remote participant's body) of an earlier video object to the video object (e.g., remote participant's head) that is presently captured in the 3D video stream. The extended part may be stored in an image part repository 209 in the memory 206 of the AR computing server 200 as shown in
The AR computing server 200 may extract the video object captured in the 3D video stream from the user device 210 to generate an extracted video stream which is output to the AR display device 220 for display through the see-through display 234. In an illustrative embodiment, the video object is one of a plurality of components of a scene captured in the 3D video stream by the 3D camera 212 of the user device 210. The operation by the AR computing server 200 to adjust 316 pose of the video object captured in the 3D video stream includes to extract the video object from the 3D video stream without the other components of the scene. The operation the operation by the AR computing server 200 to output 318 the video object to the see-through display 234 for display includes to output the extracted video object with the adjusted pose.
Although the AR computing server 200 is illustrated in
In the above-description of various embodiments of present inventive concepts, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense expressly so defined herein.
When an element is referred to as being “connected”, “coupled”, “responsive”, or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected”, “directly coupled”, “directly responsive”, or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, “coupled”, “connected”, “responsive”, or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus, a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.
As used herein, the terms “comprise”, “comprising”, “comprises”, “include”, “including”, “includes”, “have”, “has”, “having”, or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation “e.g.,”, which derives from the Latin phrase “exempli gratia,” may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation “i.e.,”, which derives from the Latin phrase “id est,” may be used to specify a particular item from a more general recitation.
Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as “circuitry,” “a module” or variants thereof.
It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the appended examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts is to be determined by the broadest permissible interpretation of the present disclosure including the following examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2021/069675 | 7/15/2021 | WO |