The present technology relates to an image processing device, an image processing method, a program, and an information processing system, and enables detection of a parallax with high precision.
Conventionally, depth information has been acquired by using polarization information. For example, an image processing device disclosed in PTL 1 performs positioning of polarization images obtained from a plurality of viewpoints by using depth information (a depth map) that indicates a distance to an object that is generated by a stereo matching process in which captured multi-viewpoint images are used. In addition, the image processing device generates normal line information (a normal line map) on the basis of polarization information detected by use of the positioned polarization images. Moreover, the image processing device increases the precision of the depth information by using the generated normal line information.
Further, NPL 1 describes generating depth information with high precision by using normal line information obtained on the basis of polarization information and depth information obtained by a ToF (Time of Flight) sensor.
Incidentally, the image processing device disclosed in PTL 1 generates depth information on the basis of a parallax detected by a stereo matching process in which captured multi-viewpoint images are used. For this reason, precise detection of a parallax in a flat portion through the stereo matching process is difficult, whereby there is a possibility that depth information cannot be obtained with high precision. In the case where a ToF sensor is used as in NPL 1, depth information cannot be obtained under a condition where no projection light arrives or a condition where return light is hardly detected. Further, the power consumption becomes large because projection light is needed.
Therefore, an object of the present technology is to provide an image processing device, an image processing method, a program, and an information processing system for enabling precise detection of a parallax almost without the influences of an object shape, an image capturing condition, and the like.
A first aspect of the present technology is an image processing device including:
In this technology, the parallax detecting section performs, by using normal line information in respective pixels based on a polarization image, the cost adjustment processing on the cost volume indicating, for each pixel and each parallax, a cost corresponding to the similarity among multi-viewpoint images including the polarization image. In the cost adjustment processing, cost adjustment of the parallax detection target pixel is performed on the basis of a cost calculated, with use of normal line information in the parallax detection target pixel, for a pixel in a peripheral region based on the parallax detection target pixel. Also, in the cost adjustment, at least one of weighting in accordance with the normal line difference between normal line information in the parallax detection target pixel and normal line information in a pixel in the peripheral region, weighting in accordance with the distance between the parallax detection target pixel and the pixel in the peripheral region, or weighting in accordance with the difference between a luminance value of the parallax detection target pixel and a luminance value of the pixel in the peripheral region, may be performed on the cost calculated for the pixel in the peripheral region.
The parallax detecting section performs the cost adjustment processing for each of normal line directions among which indefiniteness is generated on the basis of normal line information, and detects a parallax at which the similarity becomes maximum, by using the cost volume having undergone the cost adjustment processing performed for each of the normal line directions. Further, the cost volume is generated with each parallax used as a prescribed pixel unit, and on the basis of a cost in a prescribed parallax range based on a parallax of a prescribed pixel unit at which the similarity becomes maximum, the parallax detecting section detects a parallax at which the similarity becomes maximum with a resolution higher than the prescribed pixel unit. Moreover, a depth information generating section is provided to generate depth information on the basis of the parallax detected by the parallax detecting section.
A second aspect of the present technology is an image processing method including:
A third aspect of the present technology is a program for causing a computer to process multi-viewpoint images including a polarization image, the program for causing the computer to execute:
It is to be noted that the program according to the present technology can be provided by a recording medium such as an optical disk, a magnetic disk, or a semiconductor memory, or a communication medium such as a network, for providing various program codes in a computer-readable format to a general-purpose computer capable of executing the various program codes. As a result of provision of such a program in a computer-readable format, a process corresponding to the program can be executed in a computer.
A fourth aspect of the present technology is an information processing system including:
According to the preset technology, cost adjustment processing is executed, for each pixel and each parallax, on a cost volume indicating a cost corresponding to the similarity among multi-viewpoint images including a polarization image, with use of normal line information that is obtained for each pixel and that is based on the polarization image so that, from the cost volume having undergone the cost adjustment processing, a parallax at which the similarity becomes maximum is detected with use of parallax-based costs of a parallax detection target pixel. Therefore, the parallax can be precisely detected almost without the influences of an object shape, an image capturing condition, and the like. It is to be noted that the effects described herein are just examples, and thus, are not limitative. Additional effects may be further provided.
Hereinafter, embodiments for implementing the present technology will be explained. It is to be noted that the explanation will be given in accordance with the following order.
The imaging section 21 outputs a polarization image signal, which is obtained by capturing an image of a desired object, to the normal line information generating section 31 and the depth information generating section 35. Further, the imaging section 22 generates a polarization image signal or non-polarization image signal obtained by capturing an image of the desired object from a viewpoint that is different from that of the imaging section 21, and outputs the signal to the depth information generating section 35.
The normal line information generating section 31 of the image processing device 30 generates normal line information indicating a normal direction for each pixel on the basis of the polarization image signal supplied from the imaging section 21, and outputs the normal line information to the depth information generating section 35.
The depth information generating section 35 calculates, for each pixel and each parallax, a cost indicating the similarity among images by using two image signals taken from different viewpoints and supplied from the imaging section 21 and the imaging section 22, thereby generates a cost volume. Further, the depth information generating section 35 performs cost adjustment processing on the cost volume by using the image signal supplied from the imaging section 21 and the normal line information generated by the normal line information generating section 31. The depth information generating section 35 detects, from the cost volume having undergone the cost adjustment processing, a parallax at which the similarity becomes maximum by using parallax-based costs of a parallax detection target pixel. For example, the depth information generating section 35 performs a filtering process for each pixel and each parallax by using normal line information in a process target pixel and a pixel in a peripheral region based on the process target pixel having undergone the cost adjustment processing so that the cost adjustment processing on the cost volume is accomplished. Further, the depth information generating section 35 may calculate a weight on the basis of the difference in normal lines, the positional difference, and the luminance difference between the process target pixel and the pixel in the peripheral region, and performs a filtering process for each pixel and each parallax by using the calculated weight and the normal line information generated by the normal line information generating section 31 so that the cost adjustment processing on the cost volume is accomplished. The depth information generating section 35 calculates a depth for each pixel, from the detected parallax and the baseline length and focal distance between the imaging section 21 and the imaging section 22, thereby generates depth information.
Next, operation of each of the sections of the imaging device 20 will be explained. The imaging section 21 generates a polarization image signal in which three or more polarization directions are used.
The imaging section 22 may have a configuration similar to that of the imaging section 21, or may have a configuration using no polarization plate 212 or no polarizer 214. The imaging section 22 outputs the generated image signals (or polarization image signals) to the image processing device 30.
The normal line information generating section 31 of the image processing device 30 acquires a normal line on the basis of the polarization image signals.
When the coordinate system is changed from the polarization model expression indicated in Expression (1), Expression (5) is obtained. A polarization degree ρ in Expression (5) is calculated on the basis of Expression (6), and an azimuth angle φ in Expression (5) is calculated on the basis of Expression (7). It is to be noted that the polarization degree ρ represents an amplitude of the polarization model expression, and the azimuth angle φ represents a phase of the polarization model expression.
Moreover, it has been known that a zenith angle θ can be calculated on the basis of Expression (8) using a polarization degree ρ and a refractive index n of an object. It is to be noted that, in Expression (8), a coefficient k0 is calculated on the basis of Expression (9), and k1 is calculated on the basis of Expression (10). Further, the coefficients k2 and k3 are calculated on the basis of expressions (11) and (12), respectively.
Therefore, the normal line information generating section 31 can generate normal line information N (Nx, Ny, Nz) by calculating the azimuth angle φ and the zenith angle θ through the above calculation. Nx in the normal line information N represents an x-axis direction component, and is calculated on the basis of Expression (13). Further, Ny is a y-axis direction component, and is calculated on the basis of Expression (14). Moreover, Nz represents a z-axis direction component, and is calculated on the basis of Expression (15).
Nx=cos(φ)·sin(θ) (13)
Ny=sin(φ)·sin(θ) (14)
Nz=cos(θ) (15)
The normal line information generating section 31 generates the normal line information N for each pixel, and outputs the normal line information generated for each pixel to the depth information generating section 35.
The local match processing section 361 detects, for each pixel in one captured image, a corresponding point in the other captured image by using image signals generated by the imaging sections 21 and 22.
[Math. 4]
C
AD(i,d)=|Li−Ri+d| (16)
C
ZSAD(i,d)=Σ(x,y)|(Lxy−
In Expression (16), “Li” represents a luminance value of a process target pixel i in a left viewpoint image, and “d” represents a pixel unit distance from a reference position in a right viewpoint image, and corresponds to the parallax. “Ri+d” represents a luminance value of a pixel at which the parallax d from the reference position in the right viewpoint image is generated. Further, in Expression (17), “x, y” represents a position in a window, the bar Li represents an average luminance value in a peripheral region based on the process target pixel i, and the bar Ri+d represents an average luminance value in a peripheral region based on the position at which the parallax d from the reference position is generated. In addition, in the case where Expression (16) or (17) is used, when the calculated value is smaller, the similarity is higher.
In addition, a non-polarization image signal is supplied from the imaging section 22 to the local match processing section 361, the local match processing section 361 generates a non-polarization image signal on the basis of a polarization image signal supplied from the imaging section 21, and performs a local matching process. For example, since the aforementioned parameter C represents a non-polarization light component, the local match processing section 361 uses, as a non-polarization image signal, a signal indicating the pixel-based parameter C. Moreover, since usage of the polarization plate and the polarizer results in deterioration of sensitivity, the local match processing section 361 may perform gain adjustment on the non-polarization image signal generated from the polarization image signal such that the sensitivity equal to that obtained from the non-polarization image signal from the imaging section 22 can be obtained.
The local match processing section 361 generates a cost volume by calculating a similarity for each pixel in the left viewpoint image and for each parallax.
The cost volume processing section 363 performs cost adjustment processing on the cost volume generated by the local match processing section 361 such that parallax detection can be performed with high precision. The cost volume processing section 363 performs the cost adjustment processing on the cost volume by performing, for each pixel and each parallax, a filtering process with use of normal line information regarding a process target pixel for the cost adjustment processing and a pixel in a peripheral region based on the process target pixel. Alternatively, the depth information generating section 35 may perform the cost adjustment processing on the cost volume by calculating a weight on the basis of the normal line difference, positional difference, and luminance difference between the process target pixel and a pixel in the peripheral region, and by performing, for each pixel and each parallax, a filtering process with use of the calculated weight and the normal line information generated by the normal line information generating section 31.
Next, a case of calculating a weight on the basis of the normal line difference, the positional difference, and the luminance difference between a process target pixel and a pixel in a peripheral region, and performing a filtering process with use of the calculated weight and the normal line information generated by the normal line information generating section 31, will be explained.
The weight calculation processing section 3631 calculates a weight according to the normal line information, the positions, and the luminances of a process target pixel and a peripheral pixel. The weight calculation processing section 3631 calculates a distance function value on the basis of the normal line information regarding the process target pixel and the peripheral pixel, and calculates the weight for the peripheral pixel by using the calculated distance function value and the positions and/or luminances of the process target pixel and a pixel in the peripheral region.
The weight calculation processing section 3631 calculates a distance function value by using the normal line information regarding the process target pixel and the peripheral pixel. For example, it is assumed that normal line information Ni=(Ni,x, Ni,z) is about a process target pixel i, and normal line information Nj=Nj,x, Nj,y, Nj,z) is about a peripheral pixel j. In this case, the distance function value dist(Ni−Nj) of the process target pixel i and the peripheral pixel j in a peripheral region are calculated by Expression (18) to indicate the normal line difference.
[Math. 5]
diSt(Ni,Ni)=√{square root over ((Ni,x−Nj,x)2+(Ni,y−Nj,y)2+(Ni,z−Nj,z)2)} (18)
By using the distance function value dist(Ni−Nj) and using, for example, a position Pi of the process target pixel i and a position Pj of the peripheral pixel j, the weight calculation processing section 3631 calculates a weight Wi,j for the peripheral pixel with respect to the process target pixel on the basis of Expression (19). It is to be noted that, in Expression (19), a parameter σs represents a parameter for adjusting a space similarity, a parameter σn represents a parameter for adjusting a normal line similarity, and a parameter Ki represents a normalized term. The parameters σs, σn, Ki are previously set.
In addition, by using the distance function value dist(Ni−Nj), the position Pi and a luminance value Ii of the process target pixel i, and the position Pj and a luminance value Ij of the peripheral pixel j, the weight calculation processing section 3631 may calculate the weight Wi,j for the pixel in the peripheral region on the basis of Expression (20). It is to be noted that, in Expression (20), a parameter σc represents a parameter for adjusting a luminance similarity. The parameter σc is previously set.
The weight calculation processing section 3631 calculates respective weights for the peripheral pixels relative to the process target pixel, and outputs the weights to the filter processing section 3633.
The peripheral parallax calculation processing section 3632 calculates a parallax in a peripheral pixel relative to the process target pixel.
The peripheral parallax calculation processing section 3632 calculates a parallax dNj for each of peripheral pixels relative to the process target pixel, and outputs the parallaxes dNj to the filter processing section 3633.
The filter processing section 3633 performs a filtering process on the cost volume calculated by the local match processing section 361, by using the weights for the peripheral pixels calculated by the weight calculation processing section 3631 and using the parallaxes in the peripheral pixels calculated by the peripheral parallax calculation processing section 3632. By using the weight Wi,j for a pixel j in the peripheral region of the process target pixel i calculated by the weight calculation processing section 3631 and using the parallax dNj in the pixel j in the peripheral region of the process target pixel i, the filter processing section 3633 calculates the cost volume having undergone the filtering process, on the basis of Expression (22).
[Math. 9]
CN
i,d=ΣjWi,j·Cj,dNj (22)
The cost volume of a peripheral pixel is calculated for each parallax d, and the parallax d is a pixel unit value and is an integer value. The parallax dNj in a peripheral pixel calculated on the basis of Expression (20) is not limited to integer values. Thus, in the case where the parallax dNj is not an integer value, the filter processing section 3633 calculates a cost Cj,dNj at the parallax dNj by using a cost volume obtained at a parallax close to the parallax dNj.
The filter processing section 3633 obtains a cost CNi,d in the process target pixel on the basis of Expression (22) by using the weights for respective peripheral pixels calculated by the weight calculation processing section 3631 and the costs Cj,dNj in the parallax dNj at each of the peripheral pixels calculated by the peripheral parallax calculation processing section 3632. Further, the filter processing section 3633 calculates the cost CNi,d for each parallax by regarding each pixel as a process target pixel. In the manner described so far, the filter processing section 3633 performs cost adjustment processing on a cost volume by using a relationship between the normal line information, the positions, and the luminances of a process target pixel and a peripheral pixel such that a parallax at which the similarity becomes maximum in variation of the cost due to the difference in parallaxes is emphasized. The filter processing section 3633 outputs the cost volume having undergone the cost adjustment processing, to the minimum value search processing section 365.
It is to be noted that, when the weight Wi,j is “1” in Expression (22) or (25), the filter processing section 3633 performs the cost adjustment processing through the filtering process based on the normal line information. Also, when the weight Wi,j calculated on the basis of Expression (19) is used, the cost adjustment processing is performed through the filtering process based on the normal line information and a distance in the plane direction at the same parallax. Furthermore, when the weight Wi,j calculated on the basis of Expression (20) is used, the cost adjustment processing is performed through the filtering process based on the normal line information, a distance in the plane direction at the same parallax, and the luminance change.
The minimum value search processing section 365 detects, on the basis of the cost volume having undergone the filtering process, a parallax at which image similarity becomes maximum. In the cost volume, a cost at each parallax is indicated for each pixel, and, when the cost is smaller, the similarity is higher, as described above. Therefore, the minimum value search processing section 365 detects, for each pixel, a parallax at which the cost becomes minimum.
The minimum value search processing section 365 performs parabola fitting by using costs in successive parallax ranges including a minimum value from parallax-based costs in a target pixel. For example, by using costs in successive parallax ranges centered on a parallax dx having the minimum cost Cx of the costs calculated for respective parallaxes, that is, a cost Cx−1 at a parallax dx−1 and a cost cx+1 at a parallax dx+1, the minimum value search processing section 365 obtains, as a parallax in a target pixel, a parallax dt further separated from the parallax dx by a displacement amount δ such that the cost becomes minimum on the basis of Expression (23). Thus, the parallax dt having decimal precision is calculated from the parallax d the unit of which is an integer, and is outputted to the depth calculating section 37.
In addition, the parallax detecting section 36 may detect a parallax by including indefiniteness among normal lines. In this case, the peripheral parallax calculation processing section 3632 calculates the parallax dNj in the aforementioned manner by using the normal line information Ni indicating one of normal lines having indefiniteness thereamong. Further, by using normal line information Mi indicating the other normal line, the peripheral parallax calculation processing section 3632 calculates a parallax dMj on the basis of Expression (24), and outputs the parallax dMj to the filter processing section 3633.
In the case of performing a filtering process involving normal-line indefiniteness, the filter processing section 3633 performs the cost adjustment processing indicated in Expression (25) on each pixel as a process target pixel, by using the weight for each peripheral pixel calculated by the weight calculation processing section 3631 and the parallax dMj in the peripheral pixel calculated by the peripheral parallax calculation processing section 3632. The filter processing section 3633 outputs the cost volume having undergone the cost adjustment processing, to the minimum value search processing section 365.
[Math. 12]
CM
i,d=ΣjWi,j·Cj,dMj (25)
The minimum value search processing section 365 detects, for each pixel, a parallax at which the cost becomes minimum on the basis of the cost volume having undergone the filtering process based on the normal line information N and the cost volume having undergone the filtering process based on the normal line information M.
The depth calculating section 37 generates depth information on the basis of a parallax detected by the parallax detecting section 36.
Z=Lb×f/dt (26)
At step ST2, the image processing device generates normal line information. The image processing device 30 generates normal line information indicating a normal direction in each pixel on the basis of the polarization images acquired from the imaging device 20. Then, the process proceeds to step ST3.
At step ST3, the image processing device generates a cost volume. The image processing device 30 performs a local matching process by using the image signals of a captured polarization image and captured images taken from a viewpoint that is different from that of the captured polarization image acquired from the imaging device 20, and calculates, for each parallax, a cost indicating the similarity, in each pixel, between the images. The image processing device 30 generates a cost volume calculated for each parallax so as to indicate costs of pixels. Then, the process proceeds to step ST4.
At step ST4, the image processing device performs cost adjustment processing on the cost volume. By using the normal line information generated at step ST2, the image processing device 30 calculates a parallax in a pixel in a peripheral region of a process target pixel. Further, the image processing device 30 calculates a weight according to the normal line information, the positions, and the luminances of the process target pixel and the peripheral pixel. Moreover, by using the parallax in the pixel in the peripheral region or using the parallax in the pixel in the peripheral region and the weight for the process target pixel, the image processing device 30 performs the cost adjustment processing on the cost volume such that the parallax at which the similarity becomes maximum is emphasized. Then, the process proceeds to step ST5.
At step ST5, the image processing device performs minimum value search processing. The image processing device 30 acquires a parallax-based cost in a target pixel from the cost volume having undergone the filtering process, and detects a parallax at which the cost becomes minimum. In addition, the image processing device 30 regards each pixel as a target pixel, and detects, for each pixel, a parallax at which the cost becomes minimum. Then, the process proceeds to step ST6.
At step ST6, the image processing device generates depth information. The image processing device 30 calculates a depth for each pixel on the basis of the focal distance of the imaging sections 21 and 22, a baseline length representing the distance between the imaging section 21 and the imaging section 22, and the minimum cost parallax detected for each pixel at step ST5, and generates depth information indicating depths of respective pixels. It is to be noted that step ST2 may be followed by step ST3, or step ST3 may be followed by step ST2.
As explained so far, the first embodiment enables detection of a parallax for each pixel with higher precision than detection of a parallax enabled by a local matching process. In addition, with use of the detected precise parallax, depth information in each pixel can be generated with precision, whereby a precise depth map can be obtained without use of projection light, etc.
The imaging section 21 outputs, to the normal line information generating section 31 and the depth information generating section 35a, a polarization image signal obtained by capturing an image of a desired object. Further, the imaging section 22 outputs, to the depth information generating section 35a, a non-polarization image signal or a polarization image signal obtained by capturing an image of the desired object from a viewpoint that is different from that of the imaging section 21. Moreover, the imaging section 23 outputs, to the depth information generating section 35a, a non-polarization image signal or a polarization image signal obtained by capturing an image of the desired object from a viewpoint that is different from the viewpoint of the imaging sections 21 and 22.
The normal line information generating section 31 of the image processing device 30a generates, for each pixel, normal line information indicating a normal direction on the basis of the polarization image signal supplied from the imaging section 21, and outputs the normal line information to the depth information generating section 35a.
The depth information generating section 35a calculates, for each pixel and each parallax, a cost representing the similarity between images by using two image signals taken from different viewpoints and supplied from the imaging section 21 and the imaging section 22, and generates a cost volume. Further, the depth information generating section 35a calculates, for each pixel and each parallax, a cost representing the similarity between images by using two image signals taken from different viewpoints and supplied from the imaging section 21 and the imaging section 23, and generates a cost volume. Moreover, the depth information generating section 35a performs cost adjustment processing on each of the cost volumes by using the image signal supplied from the imaging section 21 and using the normal line information generated by the normal line information generating section 31. Further, by using the parallax-based costs of a parallax detection target pixel, the depth information generating section 35a detects, from the cost volumes having undergone the cost adjustment processing, a parallax at which the similarity becomes maximum. The depth information generating section 35a calculates a depth of each pixel from the detected parallax and from the baseline length and the focal distance between the imaging section 21 and the imaging section 22, and generates depth information.
Next, operation of each section of the imaging device 20a will be explained. The configurations of the imaging sections 21 and 22 are similar to those in the first embodiment. The configuration of the imaging section 23 is similar to that of the imaging section 22. The imaging section 21 outputs a generated polarization image signal to the normal line information generating section 31 of the image processing device 30a. Further, the imaging section 22 outputs a generated image signal to the image processing device 30a. In addition, the imaging section 23 outputs a generated image signal to the image processing device 30a.
The configuration of the normal line information generating section 31 of the image processing device 30a is similar to that in the first embodiment. The normal line information generating section 31 generates normal line information on the basis of a polarization image signal. The normal line information generating section 31 outputs the generated normal line information to the depth information generating section 35a.
The configuration of the local match processing section 361 is similar to that in the first embodiment. By using captured images obtained by the imaging sections 21 and 22, the local match processing section 361 calculates, for each pixel in one of the captured images, the similarity in a corresponding point in the other captured image, and generates a cost volume. The local match processing section 361 outputs the generated cost volume to the cost volume processing section 363.
The configuration of the local match processing section 362 is similar to that of the local match processing section 361. By using the captured images obtained by the imaging sections 21 and 23, the local match processing section 362 calculates, for each pixel in one of the captured images, the similarity in a corresponding point in the other captured image, and generates a cost volume. The local match processing section 362 outputs the generated cost volume to the cost volume processing section 364.
The configuration of the cost volume processing section 363 is similar to that in the first embodiment. The cost volume processing section 363 performs cost adjustment processing on the cost volume generated by the local match processing section 361 such that a parallax can be detected with high precision, and outputs the cost volume having undergone the cost adjustment processing to the minimum value search processing section 366.
The configuration of the cost volume processing section 364 is similar to that of the cost volume processing section 363. The cost volume processing section 364 performs cost adjustment processing on the cost volume generated by the local match processing section 362 such that a parallax can be detected with high precision, and outputs the cost volume having undergone the cost adjustment processing to the minimum value search processing section 366.
As in the first embodiment, the minimum value search processing section 366 detects, for each pixel, a most similar parallax, that is, a parallax at which the minimum value of the similarity is indicated, on the basis of the cost volume having undergone the cost adjustment. In addition, as in the first embodiment, the depth calculating section 37 generates depth information on the basis of the parallax detected by the parallax detecting section 36.
Similar to the first embodiment, the second embodiment enables detection of a parallax for each pixel with high precision, whereby a precise depth map can be obtained. In addition, according to the second embodiment, a parallax can be detected by using not only image signals obtained by the imaging sections 21 and 22 but also an image signal obtained by the imaging section 23. This more reliably enables precise detection of a parallax for each pixel, compared to the case where a parallax is calculated on the basis of image signals obtained by the imaging sections 21 and 22.
Further, the imaging sections 21, 22, and 23 may be arranged side by side in one direction, or may be arranged in two or more directions. For example, in the imaging device 20a, the imaging section 21 and the imaging section 22 are horizontally arranged while the imaging section 21 and the imaging section 23 are vertically arranged. In this case, for an object part for which precise detection of a parallax is difficult with image signals obtained by imaging sections that are arranged side by side in the horizontal direction, precise detection of the parallax can be accomplished on the basis of image signals obtained by imaging sections that are arranged side by side in the vertical direction.
In the aforementioned embodiments, detection of a parallax and generation of depth information with use of image signals that are obtained without any color filter, have been explained. However, the image processing device may have a color mosaic filter or the like provided to the imaging sections, and accomplish detection of a parallax and generation of depth information with use of color image signals generated by the imaging sections. In this case, it is sufficient for the image processing device to perform demosaic processing by using image signals generated by the imaging sections to generate image signals for respective color components and to use pixel luminance values calculated from the image signals for the respective color components, for example. In addition, the image processing device generates normal line information by using pixel signals of polarization pixels that are generated by the imaging sections and that have the same color components.
The technology according to the present disclosure is applicable to various products. For example, the technology according to the present disclosure may be implemented as a device mounted on a mobile body which is any one of automobiles, electric automobiles, hybrid electric automobiles, motorcycles, bicycles, personal mobilities, airplanes, drones, ships, robots, and the like.
The vehicle control system 12000 includes a plurality of electronic control units connected to each other via a communication network 12001. In the example depicted in
The driving system control unit 12010 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. For example, the driving system control unit 12010 functions as a control device for a driving force generating device for generating the driving force of the vehicle, such as an internal combustion engine, a driving motor, or the like, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting the steering angle of the vehicle, a braking device for generating the braking force of the vehicle, and the like.
The body system control unit 12020 controls the operation of various kinds of devices provided to a vehicle body in accordance with various kinds of programs. For example, the body system control unit 12020 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like. In this case, radio waves transmitted from a mobile device as an alternative to a key or signals of various kinds of switches can be input to the body system control unit 12020. The body system control unit 12020 receives these input radio waves or signals, and controls a door lock device, the power window device, the lamps, or the like of the vehicle.
The outside-vehicle information detecting unit 12030 detects information about the outside of the vehicle including the vehicle control system 12000. For example, the outside-vehicle information detecting unit 12030 is connected with an imaging section 12031. The outside-vehicle information detecting unit 12030 makes the imaging section 12031 image an image of the outside of the vehicle, and receives the imaged image. On the basis of the received image, the outside-vehicle information detecting unit 12030 may perform processing of detecting an object such as a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto.
The imaging section 12031 is an optical sensor that receives light, and which outputs an electric signal corresponding to a received light amount of the light. The imaging section 12031 can output the electric signal as an image, or can output the electric signal as information about a measured distance. In addition, the light received by the imaging section 12031 may be visible light, or may be invisible light such as infrared rays or the like.
The in-vehicle information detecting unit 12040 detects information about the inside of the vehicle. The in-vehicle information detecting unit 12040 is, for example, connected with a driver state detecting section 12041 that detects the state of a driver. The driver state detecting section 12041, for example, includes a camera that images the driver. On the basis of detection information input from the driver state detecting section 12041, the in-vehicle information detecting unit 12040 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing.
The microcomputer 12051 can calculate a control target value for the driving force generating device, the steering mechanism, or the braking device on the basis of the information about the inside or outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040, and output a control command to the driving system control unit 12010. For example, the microcomputer 12051 can perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a following distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like.
In addition, the microcomputer 12051 can perform cooperative control intended for automatic driving, which makes the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force generating device, the steering mechanism, the braking device, or the like on the basis of the information about the outside or inside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040.
In addition, the microcomputer 12051 can output a control command to the body system control unit 12020 on the basis of the information about the outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030. For example, the microcomputer 12051 can perform cooperative control intended to prevent a glare by controlling the headlamp so as to change from a high beam to a low beam, for example, in accordance with the position of a preceding vehicle or an oncoming vehicle detected by the outside-vehicle information detecting unit 12030.
The sound/image output section 12052 transmits an output signal of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of
In
The imaging sections 12101, 12102, 12103, 12104, and 12105 are, for example, disposed at positions on a front nose, sideview mirrors, a rear bumper, and a back door of the vehicle 12100 as well as a position on an upper portion of a windshield within the interior of the vehicle. The imaging section 12101 provided to the front nose and the imaging section 12105 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 12100. The imaging sections 12102 and 12103 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 12100. The imaging section 12104 provided to the rear bumper or the back door obtains mainly an image of the rear of the vehicle 12100. The imaging section 12105 provided to the upper portion of the windshield within the interior of the vehicle is used mainly to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, or the like.
Incidentally,
At least one of the imaging sections 12101 to 12104 may have a function of obtaining distance information. For example, at least one of the imaging sections 12101 to 12104 may be a stereo camera constituted of a plurality of imaging elements, or may be an imaging element having pixels for phase difference detection.
For example, the microcomputer 12051 can determine a distance to each three-dimensional object within the imaging ranges 12111 to 12114 and a temporal change in the distance (relative speed with respect to the vehicle 12100) on the basis of the distance information obtained from the imaging sections 12101 to 12104, and thereby extract, as a preceding vehicle, a nearest three-dimensional object in particular that is present on a traveling path of the vehicle 12100 and which travels in substantially the same direction as the vehicle 12100 at a predetermined speed (for example, equal to or more than 0 km/hour). Further, the microcomputer 12051 can set a following distance to be maintained in front of a preceding vehicle in advance, and perform automatic brake control (including following stop control), automatic acceleration control (including following start control), or the like. It is thus possible to perform cooperative control intended for automatic driving that makes the vehicle travel autonomously without depending on the operation of the driver or the like.
For example, the microcomputer 12051 can classify three-dimensional object data on three-dimensional objects into three-dimensional object data of a two-wheeled vehicle, a standard-sized vehicle, a large-sized vehicle, a pedestrian, a utility pole, and other three-dimensional objects on the basis of the distance information obtained from the imaging sections 12101 to 12104, extract the classified three-dimensional object data, and use the extracted three-dimensional object data for automatic avoidance of an obstacle. For example, the microcomputer 12051 identifies obstacles around the vehicle 12100 as obstacles that the driver of the vehicle 12100 can recognize visually and obstacles that are difficult for the driver of the vehicle 12100 to recognize visually. Then, the microcomputer 12051 determines a collision risk indicating a risk of collision with each obstacle. In a situation in which the collision risk is equal to or higher than a set value and there is thus a possibility of collision, the microcomputer 12051 outputs a warning to the driver via the audio speaker 12061 or the display section 12062, and performs forced deceleration or avoidance steering via the driving system control unit 12010. The microcomputer 12051 can thereby assist in driving to avoid collision.
At least one of the imaging sections 12101 to 12104 may be an infrared camera that detects infrared rays. The microcomputer 12051 can, for example, recognize a pedestrian by determining whether or not there is a pedestrian in imaged images of the imaging sections 12101 to 12104. Such recognition of a pedestrian is, for example, performed by a procedure of extracting characteristic points in the imaged images of the imaging sections 12101 to 12104 as infrared cameras and a procedure of determining whether or not it is the pedestrian by performing pattern matching processing on a series of characteristic points representing the contour of the object. When the microcomputer 12051 determines that there is a pedestrian in the imaged images of the imaging sections 12101 to 12104, and thus recognizes the pedestrian, the sound/image output section 12052 controls the display section 12062 so that a square contour line for emphasis is displayed so as to be superimposed on the recognized pedestrian. The sound/image output section 12052 may also control the display section 12062 so that an icon or the like representing the pedestrian is displayed at a desired position.
One example of a vehicle control system to which the technology according to the present disclosure is applicable has been explained above. The imaging devices 20 and 20a of the technology according to the present disclosure is applicable to the imaging section 12031, etc., among the components in the above explanation. The image processing devices 30 and 30a of the technology according to the present disclosure is applicable to the outside-vehicle information detecting unit 12030, among the components in the above explanation. Accordingly, when the technology according to the present disclosure is applied to a vehicle control system, depth information can be acquired with precision. Thus, when the three-dimensional shape of an object is recognized with use of the acquired depth information, information necessary to lessen fatigue of a driver or necessary to perform automatic driving can be acquired with high precision.
The series of processes described herein can be executed by hardware, software, or a combination thereof. In a case where the processes are executed by software, a program in which a process sequence is recorded can be executed after being installed into a memory incorporated in dedicated hardware of a computer. Alternatively, the program can be executed after being installed into a general-purpose computer that is capable of executing various processes.
For example, the program may be previously recorded in a hard disk, an SSD (Solid State Drive), or a ROM (Read Only Memory), as a recording medium. Alternatively, the program can be temporarily or persistently stored (recorded) in a removal recording medium such as a flexible disc, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical) disc, a DVD (Digital Versatile Disc), a BD (Blu-Ray Disc (registered trademark)), a magnetic disc, or a semiconductor memory card. Such a removable recording medium can be provided as what is called package software.
Alternatively, the program may be not installed into the computer from the removable recording medium, but transferred from a download site to the computer in a wireless/wired manner over a network such as a LAN (Local Area Network) or the Internet. The computer can receive the program thus transferred, and install the program into an internal recording medium such as a hard disc.
It is to be noted that the effects described herein are just examples, and thus, are not limitative. Any additional effect, which is not described herein, may be provided. In addition, the present technology should not be interpreted within the aforementioned embodiments. These technical embodiments disclose the present technology in exemplification thereof. It is obvious that a person skilled in the art can modify the embodiments or provide a substitute therefor within the scope of the gist of the present technology. That is, in order to assess the gist of the present technology, the claims should be taken into consideration.
The image processing device according to the present technology can have the following configurations.
(1)
An imaging processing device including:
The image processing device according to (1), in which
The image processing device according to (2), in which
The image processing device according to (2) or (3), in which
The image processing device according to any one of (2) to (4), in which
The image processing device according to any one of (1) to (5), in which
The image processing device according to any one of (1) to (6), in which
The image processing device according to any one of (1) to (7), further including:
With the image processing device, the image processing method, the program, and the information processing system according to the present technology, cost adjustment processing is performed on a cost volume indicating, for each pixel and each parallax, costs each corresponding to the similarity among multi-viewpoint images including a polarization image, with use of normal line information in each pixel based on the polarization image. From the cost volume having undergone the cost adjustment processing, a parallax at which the similarity becomes maximum is detected with use of the parallax-based costs of a parallax detection target pixel. Thus, a parallax can be detected with high precision almost without the influences of an object shape, an image capturing condition, and the like. Accordingly, the present technology is suited for apparatuses, etc., that need to detect three-dimensional shapes with precision.
Number | Date | Country | Kind |
---|---|---|---|
2017-237586 | Dec 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/038078 | 10/12/2018 | WO | 00 |