The present technology relates to a head mount display, an information processing apparatus, and an information processing method.
There is a function called video see through (VST) in a virtual reality (VR) device such as a head mount display (HMD) including a camera. Usually, when the HMD is worn, the field of view is blocked by the display and the housing, and a user cannot see the outside state. However, by displaying an image of the outside world captured by the camera on a display included in the HMD, the user can see the outside state while the HMD is worn.
In the VST function, it is physically impossible to completely match the positions of the camera and the user's eyes, and parallax always occurs between the two viewpoints. Therefore, when an image captured by the camera is displayed on the display as it is, a size of an object and binocular parallax are slightly different from the reality, so that spatial discomfort occurs. It is considered that this discomfort hinders interaction with a real object or causes VR sickness.
Therefore, it is considered to solve this problem using a technology called “viewpoint conversion” that reproduces an outside world video viewed from a position of the user's eye on the basis of the outside world video (color information) captured by a VST camera and geometry (three-dimensional topography) information.
The VST camera for viewing the outside world in an HMD having the VST function is usually disposed at a position in front of the HMD and in front of the user's eye due to structural restrictions. Furthermore, in order to minimize the parallax between a camera video and an actual eye position, an image of a left-eye display is usually generated by an image from a left camera, and an image of a right-eye display is usually generated by a video from a right camera.
However, when the image of the VST camera is displayed as it is on the display of the HMD, the image becomes a video in which the eyes jump out. To avoid this, a viewpoint conversion technology is used. The respective images of the left and right cameras are deformed on the basis of the geometry information of the surrounding environment obtained by a distance measurement sensor, and the original image is deformed so as to be approximated to the image viewed from the position of the user's eye.
In this case, it is preferable that the original image is captured at a distance close to the user's eye since a difference from the final viewpoint video is small. Therefore, it is usually considered ideal to place the VST camera in a position that minimizes a distance between the VST camera and the user's eye, that is, to place the VST camera in a straight line of the user's eye.
However, when the VST camera is disposed in such a manner, there is a problem that an occlusion region due to an occluding object greatly appears. Therefore, in an imaging system including a plurality of physical cameras, there is a technology of generating a video from a virtual camera viewpoint on the basis of camera videos from a plurality of viewpoints (Patent Document 1).
In Patent Document 1, after generating a virtual viewpoint video from a color image and a distance image in a main camera closest to a final virtual camera viewpoint, a virtual viewpoint video for an occlusion region of the main camera is generated on the basis of a color image and a distance image of a sub camera group second closest to the final virtual camera viewpoint. However, it is not sufficient to reduce the occlusion region that is a problem in the HMD.
The present technology has been made in view of such a problem, and an object thereof is to provide a head mount display, an information processing apparatus, and an information processing method capable of reducing an occlusion region generated in an image displayed on the head mount display having a VST function.
In order to solve the above-described problem, a first technology is a head mount display including: a left display that displays a left-eye display image; a right display that displays a right-eye display image; a housing that supports the left display and the right display so as to be located in front of eyes of a user; and a left camera that captures a left camera image, and a right camera that captures a right camera image, the left camera and the right camera being provided outside the housing, in which an interval between the left camera and the right camera is wider than an interocular distance of the user.
Furthermore, a second technology is an information processing apparatus configured to: perform processing corresponding to a head mount display including a left camera and a left display, and a right camera and a right display; generate a left-eye display image by projecting a left camera image captured by the left camera onto a viewpoint of the left display and sampling a pixel value; and generate a right-eye display image by projecting a right camera image captured by the right camera onto a viewpoint of the right display and sampling a pixel value.
Moreover, a third technology is an information processing method including: performing processing corresponding to a head mount display including a left camera and a left display, and a right camera and a right display; generating a left-eye display image by projecting a left camera image captured by the left camera onto a viewpoint of the left display and sampling a pixel value; and generating a right-eye display image by projecting a right camera image captured by the right camera onto a viewpoint of the right display and sampling a pixel value.
Hereinafter, embodiments of the present technology will be described with reference to the drawings. Note that the description will be made in the following order.
A configuration of an HMD 100 having the VST function will be described with reference to
The HMD 100 is worn by a user. As illustrated in
The color camera 101 includes an imaging element, a signal processing circuit, and the like, and is a camera capable of capturing a color image and a color video of red, green, blue (RGB) or a single color. The color camera 101 includes a left camera 101L that captures an image to be displayed on a left display 108L, and a right camera 101R that captures an image to be displayed on a right display 108R. The left camera 101L and the right camera 101R are provided outside the housing 150 toward a direction of a user's line-of-sight, and capture the outside world in the direction of the user's line-of-sight. In the following description, an image obtained by capturing by the left camera 101L is referred to as a left camera image, and an image obtained by capturing by the right camera 101R is referred to as a right camera image.
The distance measurement sensor 102 is a sensor that measures a distance to a subject and acquires depth information. The distance measurement sensor is provided outside the housing 150 in the direction of the user's line-of-sight. The distance measurement sensor 102 may be an infrared sensor, an ultrasonic sensor, a color stereo camera, an infrared (IR) stereo camera, or the like. Furthermore, the distance measurement sensor 102 may be triangulation or the like using one IR camera and a structured light. Note that the depth is not necessarily the depth of stereo as long as the depth information can be acquired, and may be a monocular depth using time of flight (ToF) or motion parallax, a monocular depth using an image plane phase difference, or the like.
The inertial measurement unit 103 is various sensors that detect sensor information for estimating a posture, inclination, and the like of the HMD 100. The inertial measurement unit 103 is, for example, an inertial measurement unit (IMU), an acceleration sensor, an angular velocity sensor, a gyro sensor, or the like with respect to two or three axis directions.
The image processing unit 104 performs predetermined image processing such as analog/digital (A/D) conversion white balance adjustment processing, color correction processing, gamma correction processing, Y/C conversion processing, and auto exposure (AE) processing on the image data supplied from the color camera 101. Note that the image processing described here is merely an example, and it is not necessary to perform all of them, and other processing may be further performed.
The position/posture estimation unit 105 estimates a position, posture, and the like of the HMD 100 on the basis of the sensor information supplied from the inertial measurement unit 103. By estimating the position and posture of the HMD 100 by the position/posture estimation unit 105, the position and posture of the head of the user wearing the HMD 100 can also be estimated. Note that the position/posture estimation unit 105 can also estimate the movement, inclination, and the like of the HMD 100. In the following description, the position of the head of the user wearing the HMD 100 is referred to as a self-position, and estimating the position of the head of the user wearing the HMD 100 by the position/posture estimation unit 105 is referred to as self-position estimation.
The information processing apparatus 200 performs processing according to the present technology. The information processing apparatus 200 uses a color image captured by the color camera 101 and a depth image generated from depth information obtained by the distance measurement sensor 102 as inputs, and generates a left-eye display image and a right-eye display image in which an occlusion region caused by an occluding object is compensated. The left-eye display image and the right-eye display image are supplied from the information processing apparatus 200 to the synthesis unit 107. Then, finally, the left-eye display image is displayed on the left display 108L, and the right-eye display image is displayed on the right display 108R. Details of the information processing apparatus 200 will be described later.
Note that the information processing apparatus 200 may be configured as a single apparatus, may operate in the HMD 100, or may operate in an electronic device such as a personal computer, a tablet terminal, or a smartphone connected to the HMD 100. Furthermore, the HMD 100 or the electronic device may execute the function of the information processing apparatus 200 by a program. In a case where the information processing apparatus 200 is realized by the program, the program may be installed in the HMD 100 or the electronic device in advance, or may be distributed by download, a storage medium, or the like and installed by the user himself/herself.
The CG generation unit 106 generates various computer graphic (CG) images to be superimposed on the left-eye display image and the right-eye display image for augmented reality (AR) display and the like.
The synthesis unit 107 synthesizes the CG image generated by the CG generation unit 106 with the left-eye display image and the right-eye display image output from the information processing apparatus 200 to generate an image to be displayed on the display 108.
The display 108 is a liquid crystal display, an organic electroluminescence (EL) display, or the like located in front of the eyes of the user when the HMD 100 is worn. As illustrated in
The image processing unit 104, the position/posture estimation unit 105, the CG generation unit 106, the information processing apparatus 200, and the synthesis unit 107 constitute an HMD processing unit 170, and after image processing and self-position estimation are performed by the HMD processing unit 170, only an image subjected to viewpoint conversion or an image generated by synthesizing the image subjected to viewpoint conversion and the CG is displayed on the display 108.
The control unit 109 includes a central processing unit (CPU), a random access memory (RAM), a read only memory (ROM), and the like. The CPU controls the entire HMD 100 and each unit by executing various processing according to a program stored in the ROM and issuing commands. Note that the information processing apparatus 200 may be implemented by processing by the control unit 109.
The storage unit 110 is, for example, a mass storage medium such as a hard disk or a flash memory. The storage unit 110 stores various applications operating on the HMD 100, various information used in the HMD 100 and the information processing apparatus 200, and the like.
The interface 111 is an interface with an electronic device such as a personal computer or a game machine, the Internet, or the like. The interface 111 may include a wired or wireless communication interface. Furthermore, more specifically, the wired or wireless communication interface might include cellular communication such as 3TTE, Wi-Fi, Bluetooth (registered trademark), near field communication (NFC), Ethernet (registered trademark), high-definition multimedia interface (HDMI (registered trademark)), universal serial bus (USB) and the like.
Note that the HMD processing unit 170 illustrated in
Next, the arrangement of the left camera 101L, the right camera 101R, the left display 108L, and the right display 108R in the HMD 100 will be described. As illustrated in
The position of the left display 108L may be considered to be the same as the position of the left eye of the user, which is a virtual viewpoint to be finally synthesized. Thus, the left display viewpoint is the user's left eye viewpoint. Furthermore, the position of the right display 108R may be considered to be the same as the position of the right eye of the user, which is a virtual viewpoint to be finally synthesized. Thus, the right display viewpoint is the user's right eye viewpoint. Therefore, the interval between the left display 108L and the right display 108R is the interocular distance between the left eye and the right eye of the user. The interocular distance is a distance (interpupillary distance) from a center of the black eye (pupil) of the left eye of the user to a center of the black eye (pupil) of the right eye. Furthermore, the interval between the left display 108L and the right display 108R is, for example, a distance between a specific position (center or the like) in the left display 108L and a specific position (center or the like) in the right display 108R.
In the following description, the viewpoint of the left camera 101L is referred to as a left camera viewpoint, and the viewpoint of the right camera 101R is referred to as a right camera viewpoint. Furthermore, the viewpoint of the left display 108L is referred to as a left display viewpoint, and the viewpoint of the right display 108R is referred to as a right display viewpoint. Moreover, the viewpoint of the distance measurement sensor 102 is referred to as a distance measurement sensor viewpoint. The display viewpoint is a virtual viewpoint calibrated to simulate the visual field of the user at the position of the user's eye.
The arrangement of the left camera 101L, the right camera 101R, the left display 108L, and the right display 108R will be described in detail with reference to
Conventionally, as illustrated in a rear view and a top view in
On the other hand, in the present technology, as illustrated in the rear view and the top view of
A person's interocular distance is statistically 72 mm or more, and can cover 99% of men. Furthermore, 95% of men can be covered with 70 mm or more, and 99% of men can be covered with 72.5 mm or more. Therefore, the interocular distance is only required to be set to about 74 mm at the maximum, and the left camera 101L and the right camera 101R is only required to be disposed so that the interval is 74 mm or more. Note that the interval between the left camera 101L and the right camera 101R and the interocular distance are merely examples, and the present technology is not limited to these values.
As illustrated in a horizontal view, the right camera 101R is provided in front of the right display 108R in the direction of the user's line-of-sight. A relationship between the left camera 101L and the left display 108L is also similar.
Note that, in some HMDs 100, the positions of the left display 108L and the right display 108R can be laterally adjusted in accordance with the size of the user's face and the interocular distance. In the case of such an HMD 100, the left camera 101L and the right camera 101R are disposed such that the interval between the left camera 101L and the right camera 101R is wider than the maximum interval between the left display 108L and the right display 108R.
As illustrated in the rear view and the lateral view, the left camera 101L, the right camera 101R, the left display 108L, and the right display 108R are disposed at substantially the same height similarly to the related art. As illustrated in the lateral view, an interval between the right camera 101R and the right display 108R is, for example, 65.9 mm. An interval between the left camera 101L and the left display 108L is also the similar.
In the conventional arrangement of the color camera and the display illustrated in
The inside of the solid line extending in a fan shape from the right camera viewpoint is a region where the rear object cannot be seen due to the front object (occluding object) at the right camera viewpoint. Furthermore, the inside of the broken line extending in a fan shape from the right display viewpoint is a region where the rear object cannot be seen due to the front object (occluding object) at the right display viewpoint.
Considering a positional relationship between the right camera viewpoint and the right display viewpoint, the shaded region of the rear object existing on the far side is not visible from the right camera viewpoint, but is visible from the right display viewpoint, that is, the right eye of the user. This region is an occlusion region by a front object (an occluding object) when an image captured by the right camera is displayed on the right display.
Meanwhile,
Considering a positional relationship between the right camera viewpoint and the right display viewpoint, the occlusion region that has occurred on the right side as viewed from the user in the conventional arrangement does not occur. Note that an occlusion region indicated by hatching is generated on the left side as viewed from the user, but this can be compensated by the left camera image captured by the left camera 101L on the opposite side.
In this manner, by configuring the interval between the left camera 101L and the right camera 101R to be wider than the interval (interocular distance) between the left display 108L and the right display 108R, it is possible to reduce the occlusion region generated by the occluding object.
The distance measurement sensor 102 is, for example, between the left camera 101L and the right camera 101R, and is provided at the same height as the left camera 101L and the right camera 101R. However, there is no particular limitation on the position of the distance measurement sensor 102, and the distance measurement sensor 102 may be provided so as to be capable of sensing toward the direction of the user's line-of-sight.
Among the four images illustrated in
Furthermore, among the four images illustrated in
Each of images A to D in
On the other hand,
Of the four images illustrated in
Furthermore, among the four images illustrated in
It can be seen that, in a case where the distance from the user's eye to the wall (rear object) is 1 m and 5 m, in a case where the hand (front object) is one hand, and in a case where the hand is both hands, although a slight occlusion region remains, the occlusion region is reduced as compared with the case of the conventional arrangement. From this simulation result, it can be seen that arrangement such that the interval between the left camera 101L and the right camera 101R is wider than the interval (interocular distance) between the left display 108L and the right display 108R as in the present technology is effective in reducing the occlusion region.
Next, processing by the information processing apparatus 200 will be described with reference to
The information processing apparatus 200 uses the left camera image captured by the left camera 101L and the depth image obtained by the distance measurement sensor 102 to generate the left-eye display image at the left display viewpoint (the viewpoint of the left eye of the user) where the left camera 101L does not actually exist. The left-eye display image is displayed on the left display 108L.
Furthermore, the information processing apparatus 200 uses the right camera image captured by the right camera 101R and the depth image obtained by the distance measurement sensor 102 to generate the right-eye display image at the right display viewpoint (the viewpoint of the right eye of the user) where the right camera 101R does not actually exist. The right-eye display image is displayed on the right display 108R.
The left camera 101L, the right camera 101R, and the distance measurement sensor 102 are controlled by a predetermined synchronization signal, perform image-capturing and sensing at a frequency of, for example, about 60 times/second or 120 times/second, and output a left camera image, a right camera image, and a depth image to the information processing apparatus 200.
The following processing is executed for each image output (this unit is referred to as a frame). Note that generation of the left-eye display image from the left display viewpoint displayed on the left display 108L will be described below with reference to
In the case of generating the left-eye display image, of the left camera 101L and the right camera 101R, the left camera 101L closest to the left display 108L is set as the main camera, and the right camera 101R second closest to the left display 108L is set as the sub camera. Then, a left-eye display image is created on the basis of the left camera image captured by the left camera 101L as the main camera, and an occlusion region in the left-eye display image is compensated using the right camera image captured by the right camera 101R as the sub camera.
First, in step S101, the latest depth image generated by performing depth estimation from the information obtained by the distance measurement sensor 102 is projected onto the left display viewpoint as a virtual viewpoint to generate a first depth image (left display viewpoint). This is processing for generating a synthesized depth image at the left display viewpoint in step S103 described later.
Next, in step S102, the past synthesized depth image (left display viewpoint) generated in the processing in step S103 in the past frame (previous frame) is subjected to the deformation processing in consideration of the variation in the position of the user to generate the second depth image (left display viewpoint).
The deformation in consideration of the variation of the position of the user means, for example, that the depth image of the left display viewpoint before the variation of the position of the user and the depth image of the left display viewpoint after the variation of the position of the user are deformed such that all pixels coincide with each other. This is also processing for generating the synthesized depth image at the left display viewpoint in step S103 described later.
Next, in step S103, the first depth image generated in step S101 and the second depth image generated in step S102 are synthesized to generate the latest synthesized depth image (left display viewpoint) (image illustrated in
Note that, in order to use the synthesized depth image (left display viewpoint) at the time of the past frame for processing of the current frame, it is necessary to store the synthesized depth image (left display viewpoint) generated by the processing in the past frame by buffering.
Next, in step S104, pixel values of colors of the left display viewpoint are sampled from the left camera image captured by the left camera 101L that is the main camera closest to the left display viewpoint that is the virtual viewpoint. A left-eye display image (left display viewpoint) is generated by this sampling.
In order to perform sampling from the left camera image, first, the latest synthesized depth image (left display viewpoint) generated in step S103 is projected onto the left camera viewpoint to generate a synthesized depth image (left camera viewpoint) (image illustrated in
The projection of the left camera image (left camera viewpoint) onto the left display viewpoint will be described. When the synthesized depth image (left display viewpoint) created in step S103 described above is projected onto the left camera viewpoint as described above, it is possible to grasp a correspondence relationship between the pixels of the left display viewpoint and the left camera viewpoint, that is, a correspondence relationship between the pixels that each pixel of the synthesized depth image (left camera viewpoint) corresponds to which pixel of the synthesized depth image (left display viewpoint). The pixel correspondence relationship information is stored in a buffer or the like.
By using the pixel correspondence relationship information, each pixel of the left camera image (left camera viewpoint) can be projected onto each corresponding pixel in the left display viewpoint, and the left camera image (left camera viewpoint) can be projected onto the left display viewpoint. As a result, the pixel value of the color of the left display viewpoint can be sampled from the left camera image. By this sampling, a left-eye display image (left display viewpoint) (image illustrated in
However, an occlusion region BL is generated in the left-eye display image (left display viewpoint) as illustrated in
Next, in step S105, the occlusion region BL in the left-eye display image (left display viewpoint) is compensated. The occlusion region BL is compensated by sampling a color pixel value from a right camera image captured by the right camera 101R, which is a sub camera second closest to the left display viewpoint.
In order to perform sampling from the right camera image, first, the synthesized depth image (left display viewpoint) generated in step S103 is projected onto the right camera viewpoint to generate a synthesized depth image (right camera viewpoint) (image illustrated in
Then, using the synthesized depth image (right camera viewpoint), the right camera image (right camera viewpoint) (image illustrated in
Since the occlusion region BL illustrated in
Next, in step S106, an occlusion region (remaining occlusion region) remaining in the left-eye display image without being compensated by the processing in step S105 is compensated. Note that, in a case where all the occlusion regions are compensated by the processing of step S105, step S106 does not need to be performed. In that case, the left-eye display image whose occlusion region has been compensated in step S105 is finally output as a left-eye display image to be displayed on the left display 108L.
This compensation of the remaining occlusion region is performed by sampling from the deformed left-eye display image generated by applying deformation in consideration of variation in the position of the user to the left-eye display image (left display viewpoint), which is the final output in the past frame (previous frame) in step S107. When this deformation is performed, the synthesized depth image in the past frame is used, and the movement amount of the pixel is determined on the assumption that there is no shape change in the subject as the imaging target.
Next, in step S108, filling processing is performed using a color compensation filter or the like in order to compensate for the residual occlusion region remaining in the left-eye display image without being compensated in the process of step S106. Then, the left-eye display image subjected to the filling processing in step S108 is finally output as a left-eye display image to be displayed on the left display 108L. Note that, in a case where all the occlusion regions are compensated by the processing of step S106, step S108 does not need to be performed. In this case, the left-eye display image generated in step S106 is finally output as a left-eye display image to be displayed on the left display 108L.
Moreover,
As described above, the left-eye display image to be displayed on the left display 108L is generated.
The processing in the first embodiment is performed as described above. According to the present technology, by disposing the left camera 101L and the right camera 101R such that the interval between the left camera 101L and the right camera 101R is wider than the interocular distance of the user, it is possible to reduce the occlusion region caused by the occluding object. Moreover, by compensating the occlusion region with the image captured by the color camera 101, it is possible to generate a display image with a reduced occlusion region or a left-eye display image and a right-eye display image without an occlusion region.
Next, a second embodiment of the present technology is described. The configuration of the HMD 100 is similar to that of the first embodiment.
As described in the first embodiment, in the present technology, the depth image of the left display viewpoint as the virtual viewpoint is generated for generating the left-eye display image, and the depth image of the right display viewpoint as the virtual viewpoint is generated for generating the right-eye display image. However, the distance measurement result by the distance measurement sensor 102 for generating the depth image may include an error (Hereinafter, this is referred to as a distance measurement error.). In the second embodiment, the information processing apparatus 200 generates a left-eye display image and a right-eye display image, and performs processing of detecting and correcting a distance measurement error.
Here, detection of a distance measurement error will be described with reference to
In the generation of the left-eye display image, the synthesized depth image generated in step S103 is projected onto the left camera viewpoint in step S104, and further, the synthesized depth image is projected onto the right camera viewpoint in step S105. At this time, focusing on any pixel in the synthesized depth image of the projection source, in a case where there is no distance measurement error, sampling is performed from each of the left camera image and the right camera image obtained by capturing the same position by the left camera and the right camera as illustrated in
On the other hand, in a case where there is a distance measurement error, an image is sampled on the basis of an erroneous depth value, and thus pixel values are sampled from a left camera image and a right camera image obtained by capturing different positions by the left camera and the right camera. Therefore, in the generation of the left-eye display image from the left display viewpoint, it is possible to determine that the depth value in the synthesized depth image of the projection source is different, that is, there is a distance measurement error in a region in which the result of sampling the pixel value from the left camera image and the result of sampling the pixel value from the right camera image are greatly different.
In the case of
On the other hand, in the case of
Next, processing by the information processing apparatus 200 will be described with reference to
Similarly to the first embodiment, the information processing apparatus 200 uses the left camera image captured by the left camera 101L and the depth image obtained by the distance measurement sensor 102 to generate the left-eye display image at the left display viewpoint (the viewpoint of the left eye of the user) where the left camera 101L does not actually exist. The left-eye display image is displayed on the left display 108L.
Furthermore, similarly to the first embodiment, the information processing apparatus 200 uses the right camera image captured by the right camera 101R and the depth image obtained by the distance measurement sensor 102 to generate the right-eye display image at the right display viewpoint (the viewpoint of the right eye of the user) where the right camera 101R does not actually exist. The right-eye display image is displayed on the right display 108R.
Note that the definitions of the left camera viewpoint, the right camera viewpoint, the left display viewpoint, the right display viewpoint, and the distance measurement sensor viewpoint are similar to those in the first embodiment.
The left camera 101L, the right camera 101R, and the distance measurement sensor 102 are controlled by a predetermined synchronization signal, perform image-capturing and sensing at a frequency of, for example, about 60 times/second or 120 times/second, and output a left camera image, a right camera image, and a depth image to the information processing apparatus 200.
Similarly to the first embodiment, the following processing is executed for each image output (this unit is referred to as a frame). Note that generation of the left-eye display image from the left display viewpoint displayed on the left display 108L will be described with reference to
In the second embodiment, the distance measurement sensor 102 outputs a plurality of depth image candidates (depth image candidates) used in processing of the information processing apparatus 200 in one frame. Pixels at the same position of the plurality of depth image candidates have different depth values. Hereinafter, the plurality of depth image candidates may be referred to as a depth image candidate group. It is assumed that each depth image candidate is ranked in advance based on the reliability of the depth value. This ranking can be performed using an existing algorithm.
First, in step S201, the latest depth image candidate group obtained by the distance measurement sensor 102 is projected onto the left display viewpoint to generate a first depth image candidate group (left display viewpoint).
Next, in step S202, the past determined depth image candidate (left display viewpoint) generated in the processing in step S209 in the past frame (previous frame) is subjected to the deformation processing in consideration of the variation in the position of the user to generate the second depth image candidate (left display viewpoint). The deformation considering the variation of the user position is similar to that in the first embodiment.
Next, in step S203, both the first depth image candidate group (left display viewpoint) generated in step S201 and the second depth image candidate (left display viewpoint) generated in step S202 are collectively set as a full depth image candidate group (left display viewpoint).
Note that, in order to use the first depth image candidate group (left display viewpoint) at the time point of the past frame for the processing of the current frame, it is necessary to store the determined depth image (left display viewpoint) generated as a result of the processing in step S209 in the past frame by buffering.
Next, in step S204, one depth image candidate (left display viewpoint) having the best depth value is output from the full depth image candidate group (left display viewpoint). The depth image candidate having the best depth value is set as the best depth image. The best depth image is a depth image candidate having the highest reliability (first reliability) among a plurality of depth image candidates ranked in advance on the basis of the reliability of the depth value.
Next, in step S205, pixel values of colors of the left display viewpoint are sampled from the left camera image captured by the left camera 101L that is the main camera closest to the left display viewpoint that is the virtual viewpoint. As a result, the first left-eye display image is generated.
In order to perform sampling from the left camera image, first, the best depth image (left display viewpoint) output in step S204 is projected onto the left camera viewpoint to generate the best depth image (left camera viewpoint). Z-Test is performed on a portion overlapping with respect to the depth, and drawing is preferentially performed at a short distance.
Then, using the best depth image (left camera viewpoint), the left camera image (left camera viewpoint) captured by the left camera 101L is projected onto the left display viewpoint. This projection processing is similar to step S104 in the first embodiment. The first left-eye display image (left display viewpoint) can be generated by this sampling.
Next, in step S206, color pixel values are sampled from the right camera image captured by the right camera 101R as the sub camera for all the pixels constituting the display image displayed on the left display 108L. The sampling from the right camera image is performed in a similar manner to step S105 using the best depth image instead of the synthesized depth image in step S105 of the first embodiment. As a result, the second left-eye display image (left display viewpoint) is generated.
Steps S204 to S208 are configured as a loop process, and this loop process is executed a predetermined number of times with the number of depth image candidates included in the depth image candidate group as an upper limit. Therefore, the loop process is repeated until a predetermined number of times is executed. In a case where the loop process has not been executed the predetermined number of times, the process proceeds to step S208 (No in step S207).
Next, in step S208, the first left-eye display image (left display viewpoint) generated in step S205 is compared with the second left-eye display image (left display viewpoint) generated in step S206. In this comparison, the pixel values of the pixels at the same position in the region that is not the occlusion region are compared in both the first left-eye display image (left display viewpoint) and the second left-eye display image (left display viewpoint). Then, the depth value of the pixel in which the difference between the pixel values is a predetermined value or more is determined to be a distance measurement error and is invalidated.
Since the first left-eye display image (left display viewpoint) is a result of sampling from the left camera image and the second left-eye display image (left display viewpoint) is a result of sampling from the right camera image, it can be said that there is a high possibility that the pixel values of the pixels at the same position are different from each other by a predetermined value or more from the left camera image and the right camera image obtained by capturing different objects by the left camera 101L and the right camera 101R as illustrated in
Steps S204 to S208 are configured as a loop process, and after the determination of the distance measurement error is performed in step S208, the process returns to step S204, and steps S204 to S208 are performed again.
As described above, one best depth image having the best depth value is output from the depth image candidate group in step S204, but in step S204 in the loop process of the second cycle, the pixel determined to be invalid in step S208 among the best depth images output in the previous loop is replaced with the pixel value of the depth image candidate having the second reliability, and the replaced pixel value is output as the best depth image. Moreover, in step S204 in the loop process of the third cycle, a pixel determined to be invalid among the best depth images output in the loop of the second cycle is output as a depth image candidate having the third reliability as the best depth image. Each time the loop process is repeated in this manner, the best depth image replaced by lowering the rank is output for the pixel determined to be invalid in step S208.
Then, when the loop process is executed a predetermined number of times, the loop ends, and the process proceeds from step S207 to step S209. Then, in step S209, the best depth image to be processed at the end of the loop is determined as the depth image of the left display viewpoint of the current frame.
Note that a pixel whose depth value is determined to be invalid in step S208 even if any depth image candidate is used is compensated using a value estimated from the depth values of surrounding pixels, some depth value of the depth image candidate, or the like.
Note that the occlusion region in the first left-eye display image (left display viewpoint) is compensated using the second left-eye display image (left display viewpoint). This compensation can be realized by processing similar to the compensation in step S105 of the first embodiment. The first left-eye display image (left display viewpoint) in which the occlusion region is compensated with the second left-eye display image (left display viewpoint) is set as the left-eye display image. Furthermore, at the time of generating the left-eye display image, the pixel value of the first left-eye display image is used for pixels having different pixel values even though the occlusion region in the first left-eye display image (left display viewpoint) is not the occlusion region in any of the second left-eye display images (left display viewpoint), that is, pixels determined to be beyond in step S208 to the end.
Next, in step S210, the occlusion region (remaining occlusion region) remaining in the left-eye display image is compensated without being compensated by the compensation using the second left-eye display image. Note that, in a case where all the occlusion regions are compensated using the second left-eye display image, step S210 does not need to be performed. In this case, the left-eye display image compensated by the second left-eye display image is finally output as the left-eye display image to be displayed on the left display 108L.
This compensation of the residual occlusion region is performed in step S211 by sampling from the deformed left-eye display image obtained by deforming the left-eye display image (left display viewpoint), which is the final output in the past frame (previous frame), similarly to step S107 in the first embodiment.
Next, in step S212, a filling process is performed using a color compensation filter or the like in order to compensate for the residual occlusion region remaining in the left-eye display image without being compensated in the processing of step S210. Then, the left-eye display image subjected to the filling processing is finally output as a left-eye display image to be displayed on the left display 108L. Note that, in a case where all the occlusion regions are compensated by the processing of step S210, step S211 does not need to be performed. In this case, the left-eye display image generated in step S210 is finally output as a left-eye display image to be displayed on the left display 108L.
The processing in the second embodiment is performed as described above. According to the second embodiment, similarly to the first embodiment, it is possible to generate a left-eye display image and a right-eye display image with a reduced occlusion region or without an occlusion region, and further detect and correct a distance measurement error.
Although the embodiment of the present technology has been specifically described above, the present technology is not limited to the above-described embodiments, and various modifications based on the technical idea of the present technology are possible.
First, a modification of the hardware configuration of the HMD 100 will be described. The configuration and arrangement of the color camera 101 and the distance measurement sensor 102 included in the HMD 100 according to the present technology are not limited to those illustrated in
Next, a modification of the processing by the information processing apparatus 200 will be described.
In the embodiments, in order to generate the left-eye display image of the left display viewpoint, processing of projecting the synthesized depth image of the left display viewpoint to the left camera viewpoint in step S104 and further projecting the synthesized depth image of the left display viewpoint onto the right camera viewpoint in step S105 is performed.
Furthermore, in order to generate the right-eye display image of the right display viewpoint, it is necessary to project the synthesized depth image of the right display viewpoint onto the right camera viewpoint in step S104, and further project the synthesized depth image of the right display viewpoint onto the left camera viewpoint in step S105. Therefore, it is necessary to project the synthesized depth image four times in the processing of each frame.
On the other hand, in this modification, in order to generate the left-eye display image of the left display viewpoint, the synthesized depth image of the right display viewpoint is projected onto the right camera viewpoint in step S105. This is the same processing as the processing of projecting the synthesized depth image of the right display viewpoint onto the right camera viewpoint performed in step S104 for generating the right-eye display image of the right display viewpoint on the opposite side, and thus can be realized by using the result.
Furthermore, in order to generate the right-eye display image of the right display viewpoint, the synthesized depth image of the left display viewpoint is projected onto the left camera viewpoint in step S105. This is the same as the processing of projecting the synthesized depth image of the left display viewpoint onto the left camera viewpoint performed in step S104 for generating the left-eye display image of the left display viewpoint on the opposite side, and thus can be realized by using the result.
Note that, for this purpose, it is necessary to pay attention to the order of the processing for generating the left-eye display image and the processing for generating the right-eye display image. Specifically, after the synthesized depth image (left display viewpoint) is projected onto the left camera viewpoint in step S104 for generating the left-eye display image, before the synthesized depth image (right display viewpoint) is projected onto the right camera viewpoint for generating the left-eye display image, it is necessary to project the synthesized depth image (right display viewpoint) onto the right camera viewpoint in step S104 for generating the right-eye display image.
Then, the projection of the synthesized depth image (right display viewpoint) for generating the left-eye display image onto the right camera viewpoint uses the processing result of step S104 for generating the right-eye display image. Furthermore, the processing result of step S104 for generating the left-eye display image is used for the projection of the synthesized depth image (left display viewpoint) for generating the right-eye display image onto the left camera viewpoint.
Therefore, the projection processing in each frame is only processing of projecting the depth image of the left display viewpoint onto the left camera viewpoint and processing of projecting the depth image of the right display viewpoint onto the right camera viewpoint, and the processing load can be reduced as compared with the embodiments.
Furthermore, in the embodiments, in order to generate the left-eye display image from the left display viewpoint, the pixel values of colors are sampled from the right camera image captured by the right camera 101R in step S105 described above. Furthermore, in order to generate a right-eye display image from the right display viewpoint, color pixel values are sampled from the left camera image captured by the left camera 101L. In order to reduce the calculation amount of the sampling processing, sampling may be performed in an image space having a resolution lower than the resolution of the original camera.
Furthermore, in step S105 of the first embodiment, in order to compensate for the occlusion region of the left-eye display image generated in step S104, sampling processing is performed on pixels in the occlusion region. However, the sampling processing may be performed on all the pixels of the left-eye display image in step S105, and the pixel value of the pixel constituting the left-eye display image may be determined by the weighted average with the sampling result in step S104. When blending of the sampling result in step S104 and the sampling result in step S105 is performed, blending and blurring processing are performed not only for pixels but also for peripheral pixels, so that it is possible to particularly suppress generation of an unnatural hue due to a difference between cameras at a boundary portion where sampling is performed only from one camera.
Moreover, there is a case where the HMD 100 includes a sensor camera other than the color camera 101 for use as a distance measurement sensor used for recognition of a user position and distance measurement. In that case, the pixel information obtained by the sensor camera may be sampled by a method similar to step S104. In a case where the sensor camera is a monochrome camera, the following processing may be performed.
A monochrome image captured by the monochrome camera is converted into a color image (in the case of RGB, R, G, and B are set to the same values), and blending and blurring processing are performed in a similar manner to the above-described modification.
A sampling result from a color image and a sampling result from a monochrome image are converted into a hue, saturation, value (HSV) space so that brightness values in the HSV space are similar to each other, and there is no abrupt change in brightness at a boundary between the color image and the monochrome image.
The color image is converted into a monochrome image, and all processing is performed on the monochrome image. At this time, blending or blurring processing similar to the above-described modification may be performed in the monochrome image space.
The present technology can also have the following configurations.
(1)
A head mount display including:
The head mount display according to (1), in which
The head mount display according to (1) or (2), in which
The head mount display according to any one of (1) to (3), in which
The head mount display according to (4), in which
The head mount display according to (4) or (5), in which
The head mount display according to any one of (1) to (6), in which
The head mount display according to (7), in which
The head mount display according to any one of (1) to (8), in which
The head mount display according to (9), in which
The head mount display according to any one of (1) to (10), in which
The head mount display according to (11), in which
The head mount display according to any one of (1) to (12), in which
The head mount display according to any one of (1) to (13), in which
The head mount display according to any one of (1) to (14), in which
The head mount display according to (3), in which
An information processing apparatus configured to:
The information processing apparatus according to (17), in which
The information processing apparatus according to (17) or (18), in which
The information processing apparatus according to any one of (17) to (19), in which
The information processing apparatus according to any one of (17) to (20), in which
The information processing apparatus according to any one of (17) to (21), in which
The information processing apparatus according to (22), in which
An information processing method including:
Number | Date | Country | Kind |
---|---|---|---|
2021-170118 | Oct 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/037676 | 10/7/2022 | WO |