Extended reality (XR) technologies include virtual reality (VR), augmented reality (AR), and mixed reality (MR) technologies, and quite literally extend the reality that users experience. XR technologies may employ head-mountable displays (HMDs). An HMD is a display device that can be worn on the head. In VR technologies, the HMD wearer is immersed in an entirely virtual world, whereas in AR technologies, the HMD wearer's direct or indirect view of the physical, real-world environment is augmented. In MR, or hybrid reality, technologies, the HMD wearer experiences the merging of real and virtual worlds.
As noted in the background, a head-mountable display (HMD) can be employed as an extended reality (XR) technology to extend the reality experienced by the HMD's wearer. An HMD can include one or multiple small display panels in front of the wearer's eyes, as well as various sensors to detect or sense the wearer and/or the wearer's environment. Images on the display panels convincingly immerse the wearer within an XR environment, be it a virtual reality (VR), augmented reality (AR), a mixed reality (MR), or another type of XR. An HMD can also include one or multiple cameras, which are image-capturing devices that capture still or motion images.
As noted in the background, in VR technologies, the wearer of an HMD is immersed in a virtual world, which may also be referred to as virtual space or a virtual environment. Therefore, the display panels of the HMD display an image of the virtual space to immerse the wearer within the virtual space. In MR, or hybrid reality, by comparison, the HMD wearer experiences the merging of real and virtual worlds. For instance, an object in the wearer's surrounding physical, real-world environment, which may also be referred to as real space, can be reconstructed within the virtual space, and displayed by the display panels of the HMD within the image of the virtual space.
Techniques described herein are accordingly directed to real space object reconstruction within a virtual space image, using a time-of-flight (ToF) camera. The ToF camera acquires a depth image having two-dimensional (2D) pixels on a plane of the depth image. The 2D pixels correspond to projections of three-dimensional (3D) pixels in real space onto the plane. For each 3D pixel, 3D coordinates within a 3D camera coordinate space of the real space are calculated based on 2D coordinates of the 2D pixels to which the 3D pixel correspond within a 2D image coordinate system of the plane, the depth image, and camera parameters of the ToF camera. The 3D pixels are then mapped from the real space to a virtual space, and an object within the real space is reconstructed within an image of the virtual space using the 3D pixels as mapped to the virtual space.
The HMD 100 can include an externally exposed ToF camera 108 that captures depth images in front of the HMD 100 and thus in front of the wearer 102 of the HMD 100. There is one ToF camera 108 in the example, but there may be multiple such ToF cameras 108. Further, in the example the ToF camera 108 is depicted on the bottom of the HMD 100, but may instead be externally exposed on the end of the HMD 100 in the interior of which the display panel 106 is located.
The ToF camera 108 is a range-imaging camera employing ToF techniques to resolve distance between the camera 108 and real space objects external to the camera 108, by measuring the round-trip time of an artificial light signal provided by a laser or a light-emitting diode (LED). In the case of a laser based ToF camera 108, for instance, the ToF camera may be part of a broader class of light imaging, detection and ranging (LIDAR) cameras. In scannerless LIDAR cameras, an entire real space scene is captured with each laser pulse, whereas in scanning LIDAR cameras, an entire real space scene is captured point-by-point with a scanning laser.
The HMD 100 may also include an externally exposed color camera 110 that captures color images in front of the HMD 100 and thus in front of the wearer 102 of the HMD 100. There is one color camera 110 in the example, but there may be multiple such color cameras 110. Further, in the example the color camera 110 is depicted on the bottom of the HMD 100, but may instead be externally exposed on the end of the HMD 100 in the interior of which the display panel 106 is located.
The cameras 108 and 110 may share the same image plane. A depth image captured by the ToF camera 108 includes 2D pixels on this plane, where each 2D pixel corresponds to a projection of a 3D pixel in real space in front of the camera 108 onto the plane. The value of each 2D pixel is indicative of the depth in real space from the ToF camera 108 to the 3D pixel. By comparison, a color image captured by the color camera 110 includes 2D color pixels on the same plane, where each 2D color pixel corresponds to a 2D pixel of the depth image and thus to a 3D pixel in real space. Each 2D color pixel has a color value indicative of the color of the corresponding 3D pixel in real space. For example, each 2D color pixels may have red, green, and blue values that together define the color of the corresponding 3D pixel in real space.
Real space is the physical, real-world space in which the wearer 102 is wearing the HMD 100. The real space is a 3D space. The 3D pixels in real space can have 3D (e.g., x, y, and z) coordinates in a 3D camera coordinate system, which is the 3D coordinate system of real space and thus in relation to which the HMD 100 monitors its orientation as the HMD 100 is rotated or otherwise moved by the wearer 102 in real space. By comparison, the 2D pixels of the depth image and the 2D color pixels of the color image can have 2D coordinates (e.g., u and v) in a 2D image coordinate space of the plane of the depth and color images.
Virtual space is the virtual space in which the HMD wearer 102 is immersed via images displayed on the display panel 106. The virtual space is also a 3D space, and can have a 3D virtual space coordinate system to which 3D coordinates in the 3D camera coordinate system can be mapped. When the display panel 106 displays images of the virtual space, the virtual space is transformed to 2D images that when viewed by the eyes of the HMD wearer 102 effectively simulate the 3D virtual space.
The HMD 100 can include control circuitry 112 (per
Therefore, the reconstructed real space object 202′ is a virtual representation of the real space object 202 within the virtual space 204 in which the wearer 102 is immersed via the HMD 100. For the real space object 202 to be accurately reconstructed within the virtual space 204, the 3D coordinates of the 3D pixels of the object 202 in the real space 200 are determined, such as within the 3D camera coordinate system. The 3D pixels can then be mapped from the real space 200 to the virtual space 204 by transforming their 3D coordinates from the 3D camera coordinate system to the 3D virtual space coordinate system so that the real space object 202 can be reconstructed within the virtual space 204.
The processing includes acquiring a depth image using the ToF camera 108 (304). The processing can also include acquiring a color image corresponding to the depth image (e.g., sharing the same image plane as the depth image) using the color camera 110 (306). For instance, the depth and color images may share the same 2D image coordinate system of their shared image plane. As noted, each 2D pixel of the depth image corresponds to a projection of a 3D pixel in the real space 200 onto the image plane, and has a value indicative of the depth of the 3D pixel from the ToF camera 108. Each 2D color pixel of the color image has a value indicative of the color of a corresponding 3D pixel.
The processing can include selecting 2D pixels of the depth image having values less than a threshold (308). The threshold corresponds to which 3D pixels, and thus which objects, in the real space 200 are to be reconstructed in the virtual space 204. The value of the threshold indicates how close objects have to be to the HMD wearer 102 in the real space 200 to be reconstructed within the virtual space 204. For example, a lower threshold indicates that objects have to be close to the HMD wearer 102 in order to be reconstructed within the virtual space 204, whereas a higher threshold indicates that objects farther from the wearer 102 are also reconstructed.
The processing includes calculating, for the 3D pixel corresponding to each selected 2D pixel, the 3D coordinates within the 3D camera coordinate system (310). This calculation is based on the 2D coordinates of the corresponding 2D pixel of the depth image within the 2D image coordinate system of the plane of the depth image. This calculation is further based on the depth image itself (i.e., the value of the 2D pixel in the depth image), and on parameters of the ToF camera 108. The camera parameters can include the focal length of the ToF camera 108 to the plane of the depth image, and the 2D coordinates of the optical center of the camera 108 on the plane within the 2D image coordinate system. The camera parameters can also include the horizontal and vertical fields of view of the ToF camera 108, which together define the maximum area of the real space 200 that the camera 108 can image.
Per
The 3D pixels 506′, 508′, and 510′ define a local 2D plane 512 having an x axis 520 and a y axis 522. The x depth image gradient of the 3D pixel 506′ along the x axis 520 is ∂Z(u,v)/∂x, where Z(u,v) is the value of the 2D pixel 506 within the depth image 500. The y depth image gradient of the 3D pixel 506′ along they axis 522 is similarly ∂Z(u,v)/∂y.
The method 400 includes calculating a normal vector for each 3D pixel based on the depth image gradients for the 3D pixel (403). Per
First, the x tangent vector for each 3D pixel is calculated (404), as is the y tangent vector (406). Per
Second the normal vector for each 3D pixel is calculated as the cross-product of its x and y tangent vectors (408). Per
The method 400 then includes calculating the 3D coordinates for each 3D pixel in the 3D camera coordinate system based on the projection matrix and the depth image (410). The projection matrix P is such that P2D=P P3D, where P2D are the u and v coordinates of a 2D pixel of the depth image 500 within the 2D image coordinate system, and P3D are the x and y coordinates of the corresponding 3D pixel within the 3D camera coordinate system (which are not to be confused with the x and y axes 520 and 522 of the local plane 512 in
The method 600 includes calculating the x coordinate of each 3D pixel within the 3D camera coordinate system (602), as well as the y coordinate (604), and the z coordinate (606). Per
The x coordinate 714 of the 3D pixel 506′ within the 3D camera coordinate system is calculated based on the u coordinate of the 2D pixel 506, the focal length 717, the u coordinate of the focal center 710, the horizontal field of view of the ToF camera 108, and the value of the 2D pixel 506 within the depth image 500. They coordinate 716 of the 3D pixel 506′ within the 3D camera coordinate system is similarly calculated based on the v coordinate of the 2D pixel 506, the focal length 717, the v coordinate of the focal center 710, the vertical field of view of the ToF camera 108, and the value of the 2D pixel 506 within the depth image 500. The z coordinate 712 of the 3D pixel 506′ within the 3D camera coordinate system is calculated as the value of the 2D pixel 506 within the depth image 500, which is the projected value of the depth 726 from the ToF camera 108 to the 3D pixel 506′ onto the z axis 702.
Specifically, the x coordinate 714 can be calculated as x=Depth×sin(tan−1((pu−cu)÷Focalu)) and they coordinate 716 can be calculated as y=Depth×sin(tan−1((pv−cv)÷Focalv)). In this equation, Depth is the value of the 2D pixel 506 within the depth image 500 (and thus the depth 726), pu and pv are the u and v coordinates of the 2D pixel 506 within the 2D image coordinate system, and cu and cv are the u and v coordinates of the optical center 710 within the 2D image coordinate system. Therefore, pu−cu is the distance 722 and pv−cv is the distance 724 in
However, in some cases, either or both of the x coordinate 714 calculation and the y coordinate 716 calculation can be simplified. For instance, the calculation of the x coordinate 714 can be simplified as x=Depth×(pu−cu)÷Focalu when Focalu is very large. Similarly, the calculation of the y coordinate 716 can be simplified as y=Depth×(pv−cv)÷Focalv when Focalv is very large.
Referring back to
The method 800 includes then mapping the 3D coordinates within the 3D ECEF coordinate system of the 3D pixel to the 3D coordinates within the 3D virtual space coordinate system (804), using a transformation between the former coordinate system and the latter coordinate system. In the method 800, then, the 3D coordinates of a 3D pixel within the 3D camera coordinate system are first mapped to interim 3D coordinates within the 3D ECEF coordinate system, which are then mapped to 3D coordinates within the 3D virtual space coordinate system. This technique may be employed if the direct transformation between the 3D camera coordinate system and the 3ED virtual space coordinate system is not available.
Referring back to
The method 900 includes displaying each 3D pixel of the object 202 as mapped to the virtual space 204 within the image of the virtual space 204 (904). That is, each 3D pixel of the object 202 is displayed in the virtual space 204 at its 3D coordinates within the 3D virtual space coordinate system. The 3D pixel may be displayed at these 3D coordinates with a value corresponding to its color or texture as was calculated from the color image. If a color image is not acquired using a color camera 110, the 3D pixel may be displayed at these 3D coordinates with a different value, such as to denote that the object 202 is a real space object that has been reconstructed within the virtual space 204.
Techniques have been described for real space object reconstruction within a virtual space 204. The techniques have been described in relation to an HMD 100, but in other implementations can be used in a virtual space 204 that is not experienced using an HMD 100. The techniques specifically employ a ToF camera 108 for such real space object reconstruction within a virtual space 204, using the depth image 500 that can be acquired using a ToF camera 108.