This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No.2007-310775, filed on Nov. 30, 2007; the entire contents of which are incorporated herein by reference.
The present invention relates to an image processing apparatus and an image processing. Specifically, the invention relates to an apparatus and method of measuring the distance to an object using an input unit such as a camera based on stereo disparity.
Stereo vision for measuring the distance to an object using two cameras based on trigonometry is an effective image processing technology used in various fields.
The most important and difficult subject in the stereo view is to search corresponding points between stereo images and obtain positional difference between the corresponding points (i.e., “disparity”) for the individual images. There are various methods of calculating the stereo disparity, and these methods are roughly divided into a local method and a global method.
In the local method, (non-) similarity of the local intensity patterns is calculated based on the SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference) or NCC (Normalized cross correlation) in a window, and a point which has the most similar intensity pattern on an epipolar line is selected as a corresponding point. The local method has merits such that the process is simple, and the disparity is basically obtained independently for each point, so that the speeding-up including parallelization of processes is easily achieved. On the contrary, it has a drawback such that the disparity cannot be obtained accurately for a point having no sufficient change in intensity there around.
In contrast, in the global method, an energy function for the disparities of all pixels are defined, and a combination of disparities having the minimum function value is obtained (e.g., see V. Kolmogorov and R. Zabih, “Computing Visual Correspondence with Occlusions using Graph Cuts,” IEEE International Conference on Computer Vision (ICCV), 2001). In this global method, the disparity can be restored even for an area having no pattern since a global disparity is estimated.
Calculation of the stereo disparity may be generalized to a problem of selecting an adequate label fp from among prepared disparity candidate labels L and allocating the selected one to each point pεP of an image P in advance.
A label which provides the minimum energy function value of E (f) as shown in the following expression (1) is the disparity to be obtained.
E(f)=Edata(f)+Esmooth(f), (1)
where f=(f1, f2, . . . , fp, . . . , f|p|) is a label for all pixels of the image P. |P| denotes the number of pixels.
Edata(f) in the first term of the expression (1) is referred to as a data term, and represents a degree of disagreement between an estimated label and an observational data (when they agree, the degree of disagreement is normally “0”), and is given by the expression (2).
where Dp(fp) represents the cost of allocating fp as an estimated label (disparity) of a pixel p.
In the local method, in which the label (disparity) estimation is independently performed in each point, f having the minimum first term value is obtained. The second term Esmooth is referred to as a smoothing term, which denotes the degree of local non-smoothness, and is given by the expression (3).
where N is a set of adjacent points, and Vp,q(fp, fq) denotes the cost of allocating fp and fq respectively as identification labels of the points p and q.
A model as shown in the expression (4) is a general expression of Vp,q(fp, fq).
V
p,q(fp,fq)=λ·T(fp≠fq), (4)
where T(·) is an operator which returns 1 when the condition provided as an argument is true, and returns 0 in other cases.
When fp is not equal to fq, T is “1,” and when fp is equal to fq, T is “0.” Therefore, when the disparities of the adjacent pixels are different, a penalty λ of a positive constant is given, and when they are the same, “0” is given. The locally uniform disparity, that is, the surface of an object which has an inclination locally parallel to the surface of the image is not likely to be restored correctly.
For example, when the disparities of a road scene are restored by a stereo camera mounted on a car, the normal vector of the road surface and the optical axis of the camera are substantially orthogonal to each other in general, and therefore an assumption of having a locally uniform disparity is not applied to this, and hence the disparity cannot be estimated correctly.
Accordingly, one advantage of an aspect of the present invention is to provide an image processing apparatus and an image processing method which enables a calculation of highly accurate disparity.
To achieve the above advantage, one aspect of the present invention is to provide an image processing apparatus including an input unit configured to input a first image and a second image being input at different position and having a common field of view; a disparity function storing unit configured to store disparity functions for obtaining disparities of a plurality of target points on the first image from coordinates of the individual target points; a first calculating unit configured to obtain the disparity based on the disparity functions from the coordinates of the target points; a second calculating unit configured to obtain corresponding points on the second image corresponding to the target points based on the obtained disparity; a intensity difference calculating unit configured to calculate the intensity differences between the intensity of the target points and the intensity of the corresponding points respectively; an agreement calculating unit configured to calculate the agreement the value of which is reduced with the increasing similarity between the disparity function of the each target point and the disparity function of another target point located around the each target point; and a disparity function selecting unit configured to obtain a combination of the disparity functions with which the minimum sum of the luminance differences and the consistencies for the plurality of target points is obtained while changing the disparity functions of the target points respectively.
Referring now to
A schematic view of an image processing apparatus 10 is shown in
The term “disparity function” is an example of the disparity as a function of image position (x, y), and the mode is arbitrary as long as it is the function of the image position. In this embodiment, the image position is expressed by a linear function as shown in the expression (5).
d=αx+βy+γ, (5)
where f=(α, β, γ) of the disparity function is referred to as “disparity affine parameter.” Since the disparity affine parameter and the disparity function have one-to-one correspondence, obtaining the disparity function of the each point is equal to obtaining the disparity affine parameter of the each point.
The disparities of all points in the image are expressed in bulk as “disparity map.” Likewise, the disparity affine parameters are expressed in bulk as “disparity affine parameter map.” When serial numbers 1, 2, . . . , p, . . . , |p| are mapped to the image, the disparity affine parameter map F is given by the expression (6). The value of F is a variable to be obtained.
F=(f1,f2, . . . ,fp, . . . ,f|P|) (6)
The image input unit 12 inputs a plurality of images from different points of view using a camera.
The multiple-viewpoint image may be input by two or more cameras simultaneously, or may be input by moving one camera when no moving object is included in a scene to be input. The orientation of the camera is arbitrary as long as the fields of view are overlapped with each other.
In this embodiment, a circumstance in which two cameras having the same configuration are arranged in lateral parallel to each other to take a stereo image is assumed. The coordinate system shown in
Then, as shown in
In such a case, assuming that corresponding points on the left image with respect to the point (x, y) on the right image are (x′, y′), y is equal to y′. Therefore, only the difference in position in the horizontal direction should be considered. In the description given below, the difference in the horizontal position is referred to as “disparity,” and is expressed as d=x′−x with the right image as a reference image.
The image storing unit 14 stores the stereo images input by the image input unit 12 in an image memory.
The initializing unit 16 initializes the disparity function of each point of the reference image, that is, the disparity affine parameter map F.
The initial value may be a given value, but the disparity map calculated by block matching, for example, may be used as the initial value.
The difference of the corresponding pixels between the stereo images when assuming a given disparity d(dmin<=d<=dmax) in a search range is calculated for the each pixel p. The difference between the corresponding pixels is calculated by the expression (7) using the disparity d described above;
D
p(d)=|I(p)−I1(p+d)|2, (7)
where I and I′ are stereo images and I(p) is the luminance value of the point p.
In the description given above, the difference is the square of the difference in luminance value between the corresponding pixels. However, it is also possible to employ the summation of absolute values of the luminance difference of the pixels around the corresponding pixel, the sum of squares of the luminance difference, or the normalized cross correlation. However, since the normalized cross correlation indicates the agreement while other measures indicate the difference (disagreement), it is necessary to carry out a suitable conversion such as the inversion of sign.
The disparity function setting unit 18 supplies an intermediate result of the disparity affine parameter map F supplied from the initializing unit 16 or a disparity function selecting unit 24 descried below and a disparity function fα to the data term calculating unit 20 and the smoothing term calculating unit 22.
By setting a plurality of the disparity functions fα in advance using an advance knowledge relating to the scene to be input and using the same in sequence, the efficiency of the process is improved.
The linear disparity function represents a plane in an actual space. The reason is described now below.
In the coordinate system shown in
x=X/Z, y=Y/Z d=B/Z, (8)
where the focal distance of the lens is omitted for simplification.
When X, Y and Z are cleared using the expression (8) with an equation of a plane π in the space as Z=pX+qY+r, the equation of the space plane π will be a linear disparity function as shown in expression (9):
d=αx+βy+γ, (9)
where, α=−pγ, β=−qγ, γ=B/r.
The disparity function represents a plane in the actual space, and hence the disparity function setting unit 18 sets the disparity function corresponding to the plane which can exist in the actual space. For example, in the case of the road scene, it is assumed that the object exists above the road surface in many cases, and hence what should be considered is the disparity function of the plane existing above the reference plane (road).
The data term calculating unit 20 and the smoothing term calculating unit 22 generate a graph G as shown in
Each of the round nodes at the top and bottom represents a disparity affine parameter. The upper node (source) represents the disparity affine parameter fα set by the disparity function setting unit 18, and the lower node (sink) represents the intermediate result Fcur of the disparity affine parameter map. Square nodes p, q, r and s correspond respectively to pixels. In other words, a graph generated when the pixel is composed of four pixels aligned laterally is exemplified.
These four nodes are each joined to the adjacent node, and are also joined to the upper and lower nodes (source and sink). These joint are referred to as “link,” and the each link is added with a weight calculated by the data term calculating unit 20 or by the smoothing term calculating unit 22.
The data term calculating unit 20 adds a weight to the link connecting the sink or the node and each point. The difference Dp(α′), Dq(α′), Dr(α′) and Ds(α′) calculated by the initializing unit 16 are added to the links from the source (α) to the nodes p, q, r and s.
The difference given from the source to the link of the each node is, for example, in the case of the Dp(α′), the difference in disparity specified by the disparity function of the point p of the intermediate result Fcur of the disparity affine parameter map, and Dq(α′), Dr(α′) and Ds(α′) are defined in the same manner.
The difference given to the link from the each node to the sink to be used is the difference of the disparity specified by the disparity affine parameter fα supplied by the disparity function setting unit 18.
The smoothing term calculating unit 22 adds a weight to the link connecting the adjacent nodes with respect to each other. For example, the weight Vp,q(fp, fq) to be added to the link connecting the pixel (node) p and the pixel (node) q is given by the expression (10).
V
p,q(fp,fq)=λ·T(fp≠fq), (10)
where fp and fq denote the disparity affine parameter of the pixels (nodes) p and q, respectively, λ is a positive constant, T(·) is an operator which returns “1” when the condition provided as an argument is true, and returns “0” in other cases.
In other words, Vp,q(fp, fq) becomes “0” when the disparity affine parameter of the pixels (nodes) p and q match, and becomes “λ” when they are different from each other. The value of λ may be the same value for all the pixels, and may be changed according to the luminance difference between the corresponding pixels.
The disparity function selecting unit 24 renews the disparity affine parameter by dividing the graph established by the data term calculating unit 20 and the smoothing term calculating unit 22 into two parts. The method of dividing will be descried below.
Firstly, it is assumed that one part includes the source and the other part includes the sink. The set of nodes including the source is denoted by S. The set of links outgoing from S and advancing toward the nodes other than S is referred to as cut, and the weight of the links included in the cut is referred to as cut capacity.
For example, in the case of division indicated by a dot line in
The cut having the minimum cut capacity from among the all available cuts is referred to as “minimum cut.” The disparity function selecting unit 24 divides the graph G by the minimum cut. The minimum cut is obtained, for example, by a graph cut algorism.
After having divided, the affine disparity function of the pixel (node) included it he partial set (S) including the source is renewed to fα, and the affine disparity function of the pixel (node) which is not included in S is not renewed.
The disparity affine parameter map F after having changed is supplied to the disparity function setting unit 18 as the intermediate result if the process for all the disparity functions set by the disparity function setting unit 18 is not terminated, and if it is terminated, F is outputted as the final result of the disparity data.
According to the preferred embodiment of the invention, high-density, high-accuracy disparity data may be obtained from a stereo image irrespective of the direction of inclination or presence or absence of a pattern of the surface of the object.
The invention is not limited to the embodiment shown above as is, and the components may be modified to embody without departing from the scope of the invention. Various modes of the invention are formed by the suitable combination of the plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, the components in the different embodiments may be combined as needed.
Other modification may be made without departing the scope of the invention.
In this embodiment, the stereo view in the case where the two cameras are arranged in lateral parallel has been described. However, the cameras may be arranged vertically, or three or more cameras may be used.
In this embodiment, the graph cut is used as the energy minimizing method. However, other optimizing algorism such as Belief Propagation may be employed.
In this embodiment, the case where the disparities of all pixels are globally estimated using the energy minimizing method has been described. However, the process maybe applied to a specific area.
For example, the disparity may be obtained by obtaining the disparity by the block matching, estimating the inclination of the surface of the object, and using the method described in this embodiment only for the area having the local inclination which is not parallel to the surface of the image.
In this embodiment, the disparity function is set as the linear function. However, the invention is not limited thereto, and quadratic function indicating a curved surface or other functions may be employed.
Number | Date | Country | Kind |
---|---|---|---|
2007-310775 | Nov 2007 | JP | national |