The present invention pertains to a distance information generation apparatus that obtains distance information for an object in a real space, and a distance information generation method.
A technique for obtaining a position or motion of a real object and, in accordance therewith, performing information processing or giving a warning is used in many fields, such as electronic content, robots, vehicles, surveillance cameras, unmanned aircraft (drones), and IoT (Internet of Things). For example, in the field of electronic content, motion by a user who is wearing a head-mounted display is detected, and, to correspond thereto, a game is caused to progress and be reflected to a virtual world being displayed, whereby it is possible to realize a virtual experience having a sense of immersion.
LiDAR (Light Detection and Ranging) is one technique for obtaining position information for a real object. LiDAR is a technique for irradiating light onto a real object and observing reflected light therefrom to thereby derive a distance to the real object. Put to practical use as LiDAR are dTof (direct Time of Flight) which obtains a distance on the basis of the time difference from irradiation of pulse-shaped light to the observation of reflected light, and iToF (indirect Time of Flight) which obtains a distance on the basis of a phase difference for light for which a period has been made to change (for example, refer to PTLs 1 and 2, and NPL 1).
[PTL 1] Japanese Patent Laid-Open No. 2019-152616, [PTL 2] Japanese Patent Laid-open No. 2012-49547
[NPL 1] Dr. David Horsley, “World’s first MEMS ultrasonic time-of-flight sensors, [online], TDK Technologies & Products Press Conference 2018, [searched on Jul. 8, 2020], internet electronics.tdk.com/download/2431644/f7219af118484fa9afc46dc1699bacca/02-presentation-summary.pdf>
Regardless of the method employed, if using typical distance measurement techniques, it is often the case that there are limited points on a real object for which distance values are obtained, or only information at a coarse granularity is obtained. When attempting to measure and obtain detailed information, versatility becomes poor because an apparatus becomes large-scale or an amount of time is required to output a result. From now on, it is considered that information processing performed using the distance to a real object as well as position information or motion thereof will increasingly diversify, and easily obtaining detailed position information at higher accuracy is required.
The present invention is made in the light of such problems, and an objective thereof is to provide a technique for accurately and easily obtaining detailed position information for a real object.
One aspect of the present invention pertains to a distance information generation apparatus. The distance information generation apparatus includes a measurement depth image obtainment unit configured to obtain data regarding a measurement depth image that expresses a distribution of distance values to an object and is obtained by using image capturing, an upsampling unit configured to generate a candidate depth image obtained by upsampling, by a predetermined method, distance values expressed by the measurement depth image, a reliability determination unit configured to, on the basis of information different to the measurement depth image, determine, for each pixel or for each region, a reliability for distance values expressed by the candidate depth image, and an output data generation unit configured to, on the basis of the reliability, read out distance values to be employed from the candidate depth image, and generate and output an output depth image having the distance values to be employed as pixel values.
Yet another aspect of the present invention pertains to a distance information generation method. The distance information generation method includes a step for obtaining data regarding a measurement depth image that expresses a distribution of distance values to an object and is obtained by using image capturing, a step for generating, and storing in a memory, a candidate depth image obtained by upsampling, by a predetermined method, distance values expressed by the measurement depth image, a step for, on the basis of information different to the measurement depth image, determining, for each pixel or for each region, a reliability for distance values expressed by the candidate depth image, and a step for reading out, on the basis of the reliability, distance values to be employed from the candidate depth image, and generating and outputting an output depth image having the distance values to be employed as pixel values.
Note that results of converting any defined combinations of the above components and expressions of the present invention among a method, an apparatus, and so forth are also effective as aspects of the present invention.
According to the present invention, it is possible to accurately and easily obtain detailed position information for a real object.
The present embodiment pertains to a technique for generating distance information for an object. Specifically, a two-dimensional distribution of distance values having coarse granularity and obtained with a conventional distance measurement technique is accurately supplemented (upsampled), and detailed distance information is generated. A distance measurement technique employed in this case is not particularly limited to any kind, and it is possible to employ any measurement method or means that puts a TOF sensor, a stereo camera, or the like to practical use.
By using a stereo camera, it is possible to use the principle of triangulation to obtain the distance on the basis of position deviation for images of the same real object 6 in a stereo image captured by two cameras separated to the left and right. These are all known techniques. A TOF sensor and a stereo camera can be comprehended as “image capturing” means in the point of detecting a two-dimensional distribution for light from the real objects 6, and therefore these are generically named as an image capturing apparatus in the present embodiment. However, the image capturing apparatus 8, as needed, also has a function for obtaining a two-dimensional distribution for distance values by performing a calculation on an image which is a result of having performed image capturing. Alternatively, this function may be provided in the distance information generation apparatus 10.
The image capturing apparatus 8 according to the present embodiment also has a function for capturing a color image of the real objects 6. For example, it may be that the image capturing apparatus 8 includes a typical image sensor such as a CCD (Charge Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor), detects reflected light for the TOF described above, as well as captures a color image. In a case of using a stereo camera to obtain a distance, it may be that one captured image is used as a color image. Alternatively, to the extent that a determination is made to handle an angle of view, a camera for capturing a color image may be provided separately from a camera for distance measurement.
In any case, the image capturing apparatus 8 outputs, to the distance information generation apparatus 10 at a predetermined timing or at a predetermined frequency, a two-dimensional distribution of distance values to the real objects 6 and data for a color image which is a two-dimensional distribution of color information for the real objects 6. Note that, as described below, the image capturing apparatus 8 may capture a polarized image having a plurality of orientations. In this case as well, it may be that a polarizing plate for which an orientation can be changed according to rotation is mounted on the front surface of a camera lens or a polarizer layer having a different main axis angle is provided for an image capturing element included in an image sensor, to thereby obtain a two-dimensional distribution of distance values, a color image, and a polarized image by using the same image sensor, or it may be that a polarizing camera is separately provided.
The distance information generation apparatus 10 obtains the two-dimensional distribution of distance values and color image data from the image capturing apparatus 8, and upsamples the distance values to thereby generate more detailed distance information. In other words, from a two-dimensional distribution of distance values which has gaps or coarse granularity and is obtained by using TOF or a stereo image, the distance information generation apparatus 10 generates data for a second-order distribution in which distance values are uniformly expressed at a predetermined resolution. Below, the former is referred to as a measurement depth image, and the latter is referred to as an output depth image.
In generation of an output depth image, on the basis of information that expresses the real objects 6 and has been separately obtained such as a color image, the distance information generation apparatus 10 switches, in an image plane, a method to use for upsampling. Note that the image capturing apparatus 8 and the distance information generation apparatus 10 may be realized as the same apparatus. Alternatively, the distance information generation apparatus 10 or the image capturing apparatus 8 and the distance information generation apparatus 10 may be made to be a portion of an apparatus that uses distance information to perform information processing, such as a game apparatus, a portable terminal, a personal computer, or a head-mounted display.
(c) illustrates an output depth image generated by the distance information generation apparatus 10. The distance information generation apparatus 10 can upsample a measurement depth image as appropriate to thereby generate a depth image in which distance values are expressed in particularly fine units. As a result, it is possible to also identify the shape of the surface of each object and not only the distance in units of objects, and it is possible to accurately perform object recognition, motion detection, etc. As a result, it is possible to improve the accuracy of information processing that uses these items of data, and broaden an application range.
The CPU 23 executes an operating system stored in the storage unit 34, to thereby control the entirety of the distance information generation apparatus 10. The CPU 23 also executes various programs that are read out from a removable recording medium and loaded into the main memory 26 or are downloaded via the communication unit 32. The GPU 24 has a geometry engine function and a rendering processor function, and performs a drawing process or image analysis according to a command from the CPU 23. The main memory 26 includes a RAM (Random-Access Memory), and stores a program or data that is necessary for processing.
The distance information generation apparatus 10 includes a measurement depth image obtainment unit 50 that obtains data regarding a measurement depth image, a color image obtainment unit 56 that obtains data regarding a color image, an upsampling unit 52 that performs upsampling result for a distance value, a reliability determination unit 58 that determines a reliability of an upsampling result for each pixel or for each region, an output data generation unit 60 that, on the basis of a reliability, selects an upsampling result and generates an output depth image, and an output unit 62 that outputs the generated output depth image.
The measurement depth image obtainment unit 50 obtains data regarding a depth image measured by the image capturing apparatus 8. In a case where the distance information generation apparatus 10 continues to generate distance information, the measurement depth image obtainment unit 50 obtains data for a measurement depth image at a predetermined frame rate. The color image obtainment unit 56 obtains data regarding a color image obtained by being captured by the image capturing apparatus 8. The color image is captured at the same timing as the measurement depth image obtained by the measurement depth image obtainment unit 50. As described above, the color image and the measurement depth image are made to have corresponding fields of view.
For example, as described above, if both are obtained by the same image sensor, the fields of view match by themselves. In a case of using different cameras, it is possible to obtain correspondence for the fields of view by prior calibration. The upsampling unit 52 uses a predetermined method to upsample the measurement depth image obtained by the measurement depth image obtainment unit 50, to thereby generate a depth image that expresses candidate distance values (hereinafter, referred to as a candidate depth image).
For interpolation of pixel values or expansion of an image, various methods have been put into practical use in the past, such as a bilinear method, a nearest neighbor method, a bilateral method, a median method, and Gaussian interpolation. Furthermore, in recent years, a CNN (Convolutional Neural Network) is widely known as a technique for image processing in the field of deep learning All of these methods exhibit the effect of being able to make the appearance of a typical image be satisfactory but, in a case where application is with respect to a depth image, it may not be possible to achieve sufficient accuracy from a perspective of what kind of shape change a real object undergoes.
For example, by using the bilinear method or a CNN, there is slow change for a shape having a step, such as an outline portion of an object, and a result in which an inclination is gentler than the actual step is more likely to be achieved. This is common for these methods, and is due to using regression analysis to fit a continuous function to pixel values. In contrast, if using a method that directly uses pixel values for sample points without using regression, how sample points are selected, etc., impacts the accuracy of distance values.
For example, the median method takes the median of the pixel values for a predetermined number of neighboring sample points as the pixel value for a target pixel. In this case, because pixel values not selected as sample points are not reflected in a result, there are cases where accuracy is not produced in a region of an image of an object surface that changes smoothly, such as an inclined plane. In this manner, in upsampling that takes a depth image as a processing target, there are shapes (change of distance) for which it is difficult to reflect an entity, regardless of the method.
Accordingly, in the present embodiment, for example, results of hypothetically performing upsampling by a plurality of methods are obtained as candidate depth images, and a result having a high reliability is selected for each pixel or for each region. In this case, as illustrated, the upsampling unit 52 has a plurality of independent processing mechanisms including a first processing unit 54a, a second processing unit 54b, a third processing unit 54c, ..., and upsamples a depth image by respectively different methods. For example, the first processing unit 54a, the second processing unit 54b, and the third processing unit 54c upsample a measurement depth image by a CNN, the median method, and the nearest neighbor method, respectively.
Methods of upsampling performed by the upsampling unit 52 are determined in advance and allocated to the first processing unit 54a, the second processing unit 54b, the third processing unit 54c, .... Accordingly, each of the first processing unit 54a, the second processing unit 54b, the third processing unit 54c, ... holds, in an internal memory, various parameters or a filter necessary for processing. In addition, the first processing unit 54a which uses a CNN holds a result of learning a continuous function to be employed. This learning result may be obtained from a server or the like to which the distance information generation apparatus 10 is connected via a network. Alternatively, the first processing unit 54a itself may be provided in a cloud environment.
Note that, although the first processing unit 54a, the second processing unit 54b, the third processing unit 54c, ... are illustrated in the figure as functional blocks that perform upsampling independently, the number of processing units is not particularly limited. For example, even with one processing unit, it is possible to invalidate a result for which a reliability is less than or equal to a threshold to thereby remove a distance value for which a large error is predicted from output data. In a case of providing a plurality of processing units in the upsampling unit 52, the greater the number thereof the more a processing load will increase, leading to a delay in outputting a result. In addition, a method that tends to have low reliability has a low frequency of results thereof being employed, and is largely wasteful overall.
Accordingly, an appropriate number is set according to the processing performance of the distance information generation apparatus 10 and an accuracy required for output data. It is typical for approximately four processing units to be provided. The reliability determination unit 58 uses a color image obtained by the color image obtainment unit 56, to derive, for each pixel or for each region, a reliability of a processing result in the upsampling unit 52 from such a perspective such as a characteristic of the surface or the shape of an object or a measurement status. For example, the reliability determination unit 58 uses a CNN to extract an edge region that represents an outline of an object from the color image. This processing is one which is typical in a CNN.
Therefore, the reliability determination unit 58 imparts a high reliability to a result of upsampling by using the median method with respect to a pixel included in an edge region or in a region within a predetermined range from the region, and imparts a high reliability to a result of upsampling by using a CNN with respect to a pixel in another region. In this manner, weaknesses due to using the various methods described above are respectively compensated, and distance values having a high accuracy can be achieved for all pixels.
In a case where the upsampling unit 52 performs upsampling by using one method, the reliability determination unit 58 may determine whether or not to employ a result of the upsampling, as a reliability. For example, in a case where the upsampling unit 52 performs only upsampling by using a CNN, the reliability determination unit 58 sets upsampling results for pixels included in an edge region as to be rejected and sets other regions as to be employed. In the former case, invalid data is assigned as the value of corresponding pixels in an output depth image.
A reliability determined by the reliability determination unit 58 in this manner may be a rank imparted to a plurality of upsampling results or may be something that expresses employment/rejection for one upsampling result. In other words, a reliability is not limited to being expressed as a number, and may be information expression whether to employ or not employ a result according to whichever method. Means by which the reliability determination unit 58 determines a reliability is not limited to a CNN, and any typical image analysis technique such as edge extraction, template matching, or feature point extraction may be used.
In addition, a ground why the reliability determination unit 58 determines a reliability is not limited to being an edge region or not. For example, the reliability determination unit 58 may use a color image to perform subject recognition by using a CNN, and identify, on the basis of information registered in advance, an upsampling method suitable for an estimated shape as well as an unsuitable upsampling method to thereby increase the reliability of the former and decrease the reliability of the latter.
Alternatively, the reliability determination unit 58 may adjust the reliability of a method for directly performing upsampling from a sample point, the adjustment being according to the sample point, in other words, the density of distance values obtained through measurement. For example, in a case of performing upsampling by the median method, pixels that have an obtained distance value and are not within a predetermined range or pixels that do not reach a predetermined number are identified, and a low reliability is imparted to results by using the median method for these pixels. Alternatively, the reliability determination unit 58 may identify a pixel having a low reliability for a distance measurement value in a measurement depth image, and impart a low reliability to all upsampling methods for a region in a predetermined range from this pixel.
As cases where the reliability of a distance measurement value is low, there are cases for a region having many specular reflection components, a region in which only a portion of a surface texture changes, an image for a black object, and where a plurality of local maximums are detected for the number of photons observed in a TOF sensor. The reliability determination unit 58 may analyze a color image to thereby identify a pixel estimated to have a low measurement value reliability due to these situations. At this time, a CNN may be used to estimate such a pixel on the basis of a result learned in advance.
Alternatively, the reliability determination unit 58 may use a polarized image described above to identify a pixel having a low measurement value reliability. A polarizing camera that can capture a color polarized image due to a polarizer layer being provided on an upper layer of a color filter is widely known. For example, a polarizer that includes a fine wire grid is provided in an upper layer for the image capturing element at main axis angles 0°, 45°, 90°, and 135°, and light transmitted through the polarizers and the color filter is converted to an electric charge and read out, whereby polarized images for four types of orientations can be obtained as color images (for example, refer to Japanese Patent Laid-Open No. 2012-80065).
In this case, it is possible to add together, for each pixel, the polarized images for four orientations to thereby obtain a color image with natural light. In addition, a dependency of polarization intensity with respect to orientation is obtained for each pixel to thereby enable identification of a reflection characteristic for an object surface, in other words, whether a specular reflection component is dominant or a diffuse reflection component is dominant (for example, Japanese Patent Laid-Open No. 2009-58533). Accordingly, the reliability determination unit 58 identifies a region where a specular reflection component is dominant from the polarized images for the four orientations, and reduces the reliability of a distance value measured in this region or an upsampling result that uses this.
Furthermore, the reliability determination unit 58 may use relative motion information between the image capturing apparatus 8 and a real object as a ground for a reliability determination. For example, in a case where the distance information generation system 12 according to the present embodiment is mounted in an unillustrated head-mounted display, the reliability determination unit 58 obtains a measurement value from a motion sensor caused to be incorporated in the head-mounted display. When the head-mounted display has high acceleration or angular velocity, blurring (motion blur) arises in a captured color image.
In this case, because an error will arise in image recognition using this color image, the accuracy of a reliability based on this will itself worsen. Accordingly, for example, the reliability determination unit 58 may suspend processing for determining a reliability in a time period in which a parameter that expresses the magnitude of motion, such as an acceleration or angular velocity measured by a motion sensor or a speed obtained from these, exceeds a threshold. Alternatively, for this time period, the reliability determination unit 58 may set the reliability of all upsampling results to low, or may suspend the output itself of an output depth image.
Note that it is possible to similarly obtain motion information in a case where the distance information generation system 12 is immobilized and a real object is regarded as an object in which a motion sensor is mounted, such as a controller. Motion sensors may be mounted in both of the distance information generation system 12 and a real object. In any case, it is sufficient if relative motion for the two is identified and processing for determining reliability is switched according to the magnitude thereof, and motion information may be obtained by using an external camera in place of a motion sensor.
Stored in advance in the reliability determination unit 58 is a table that associates a condition enabling high accuracy to be achieved and a condition likely to result in low accuracy, with each upsampling method. Here, a condition likely to result in low accuracy means a condition that will not result in more than a certain level of accuracy even if a filter coefficient is changed or no matter how much training is performed in the case of a CNN, and can be identified by a prior experiment or a learning result. Similarly, by using a prior experiment or a learning result, a condition enabling high accuracy to be achieved is registered together with an optimal filter coefficient, algorithm, or the like.
Therefore, at a time of operation, the reliability determination unit 58 increases the reliability of an upsampling method for which an analysis result of a color image or the like matches a condition that enables high accuracy to be achieved. In addition, the reliability is lowered for an upsampling method for which the analysis result matches a condition likely to result in low accuracy. Alternatively, it may be that one upsampling method that is to be a base is defined in advance and, for only a portion in which this method satisfies a condition likely to result in low accuracy, the reliability is determined in order to select another method for which a higher accuracy can be achieved.
In a case of evaluating reliability from many sides in consideration of the reliability of a measurement value itself as described above, for example, the reliability determination unit 58 may calculate a score S(i) expressing the reliability of an i-th upsampling method in the following manner.
Here, si(x) is a function that expresses a reliability score with respect to an element x. The element x is, for example, something such as whether it is an edge region or not, as well as the shape, the color, the reflection characteristic, and a relative speed of an object. There are cases where the function si(x) changes due to the upsampling method even with the same element x, and cases where the function si(x) has the same change regardless of upsampling method.
On the basis of the reliability determined by the reliability determination unit 58, the output data generation unit 60 determines and reads out, for each pixel, a distance value having a high reliability from candidate depth images generated by the upsampling unit 52, to thereby generate an output depth image. In a case where the reliability determination unit 58 assigns a reliability rank to upsampling results, the output data generation unit 60 reads out the upsampling result having the highest rank. In a case where the reliability determination unit 58 calculates the above-described score value, the output data generation unit 60 reads out the upsampling result that has achieved the highest score value.
In addition, the output data generation unit 60 does not read out an upsampling result for pixels for which the reliability determination unit 58 has determined the rejection of an upsampling result, or pixels that all have score values less than or equal to a threshold. In this case, the output data generation unit 60 stores a value indicating invalid, such as 0, to these pixels in the output depth image.
Note that the output data generation unit 60 may also perform filtering processing on a depth image that includes distance values read out from candidate depth images. For example, it is possible to perform smoothing by using a Gaussian filter or the like to thereby prevent distance values for adjacent pixels from changing unnaturally due to different upsampling methods. In this case, it may be that the output data generation unit 60 performs smoothing processing only near a boundary between regions in which different upsampling results are employed.
The output unit 62 outputs an output depth image generated by the output data generation unit 60. An output destination may be another module that uses the output depth image to perform information processing, or may be a storage region or the like in the distance information generation apparatus 10. Note that the measurement depth image obtainment unit 50 or the color image obtainment unit 56 may instantly obtain a measurement depth image or a color image in a pixel column order by which the image capturing apparatus 8 has performed measurement or image capturing.
Each processing unit in the upsampling unit 52 internally holds a line buffer for temporarily storing a number of rows of data from among a measurement depth image that are necessary for upsampling, and a line buffer for temporarily storing an upsampled result. Therefore, the output data generation unit 60, for each pixel, reads out a distance value to be employed from the line buffer that temporarily stores upsampled results, to thereby generate an output depth image, and sequentially supplies the output depth image to the output unit 62. In this manner, each functional block starts its own processing without waiting for processing for one frame in a previous functional block, to thereby enable measured distance information to be refined and outputted with a low delay.
Meanwhile, the reliability determination unit 58 analyzes the color image to thereby determine, for each pixel or for each region, a reliability for a distance value expressed by a candidate depth image (S12). In addition, it may be that, on the basis of a polarized image obtained together with the color image, the reliability determination unit 58 obtains a reflection characteristic for an object surface or obtains relative motion information between the image capturing apparatus 8 and an object from a motion sensor and, on the basis thereof, detects a time period in which motion blur occurs and reflects this time period to the reliability. Note that, among a color image analysis result, an object reflection characteristic, and information regarding a relative motion between the image capturing apparatus 8 and an object, all may be reflected to reliability or any one or two may be reflected to reliability.
For example, it may be that a color image is not used, and only a reflection characteristic according to a polarized image and/or motion information is set as a ground for a reliability determination. In any case, the reliability determination unit 58 generates a reliability image 76 in which a reliability is associated for each pixel in an image plane corresponding to an output depth image. On the basis of information regarding the reliability expressed by the reliability image 76, the output data generation unit 60 reads out, from a candidate depth image 74, a distance value to be employed, to thereby generate a final output depth image 78 (S14).
The output unit 62 outputs data for the generated output depth image to an external module or stores the generated output depth image in an internal storage region. These processes as above are actually performed in units of pixel columns, and the next subsequent process is started without waiting for processing for one frame, whereby distance information is outputted at high speed. Illustrated processing is repeated for a measurement depth image 70 that is for a respective time and is obtained by the image capturing apparatus 8 at a predetermined rate, whereby the output depth image 78 can also be outputted at this rate.
In contrast, in the candidate depth image 96 according to the weighted median method, distance values are discrete in regions 102 that express the inclination of object surfaces, and smooth change as with the true values 92 is not achieved. By using reliabilities based on the edge image illustrated in
According to the present embodiment described above, measured depth images are upsampled, and a real object or measurement situation is also identified by using other means, whereby a reliability is assigned for upsampling results, and a result to be employed is selected for each pixel. For example, a pixel is classified by whether or not the pixel is in an edge region that expresses an object outline, and distance values achieved by respectively suitable methods are employed. Alternatively, an upsampling result is invalidated for a region for which not achieving accuracy is predicted.
As a result, differing to typical image interpolation, it is possible to easily derive a good result while using a conventional method, even in a depth image for which, rather than appearance, accuracy of values is required. In addition, the reliability of an upsampling result is determined by using a CNN or using an object surface reflection characteristic based on a polarized image or motion information according to a motion sensor, whereby it is possible to increase the accuracy of selecting distance values and consequently output distance information at high accuracy and high resolution.
Description is given above based on an embodiment of the present invention. The above-described embodiment is an example, and a person skilled in the art would understand that various variations can be made to combinations of respective components or processing processes of the embodiment, and that these variations are within the scope of the present invention.
8
10
23
24
26
50
52
54
a
54
b
54
c
56
58
60
62
As above, the present invention can be used in various apparatuses such as a distance information generation apparatus, an information processing apparatus, a game apparatus, a content processing apparatus, or a head-mounted display, and a system that includes these.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/035924 | 9/24/2020 | WO |