The present invention relates to a pattern discriminating technology for discriminating a substance included in an image.
In Non-patent Document 1, a method of extracting a feature value in a pattern recognition (pattern discrimination) for discriminating a human and others from an image is disclosed. More specifically, the image is divided into plural areas in a reticular pattern and a histogram in the direction of a brightness gradient calculated for each of the areas is employed as a feature value.
In Patent Document 1 and Non-Patent Document 2, methods of extracting a feature value for classifying the types of the texture images are described. A technology described in Patent Document 1 employs a simultaneous occurrence matrix having elements P(i, j), which are the numbers of combinations in which a concentration of a point shifted in parallel from a point having a concentration i in certain distance and direction is j in an image as the feature value. In the texture image, since similar patterns appear repeatedly equidistantly, the feature value representing the simultaneous occurrence of concentration values at two points apart from each other at an equal distance is effective for discrimination of the texture. A technology in Non-Patent document 2 extracts a robust feature value with respect to an illumination variation by using the simultaneous occurrence matrix in the direction of the brightness gradient instead of the concentration value of the image.
In the technology in Non-Patent Document 1, since the brightness gradient distribution is calculated for each of the areas segmentalized into a reticular pattern, the structure information of the texture in the area is not reflected in the feature value. For example, even when the patterns are upside down in the same area in the reticular pattern, the completely same feature value is calculated from this area, and hence there is a problem that the loss of information effective for discrimination is resulted.
Since the feature values calculated in the technologies described in Patent Document 1 and Non-Patent Document 2 are based on the brightness at two points different in spatial position or the brightness gradient, information relating to the structure is also reflected to some extent. However, since the feature values are based on a two-dimensional image, there is a problem in that the feature value does not describe the three-dimensional shape of the substance.
In order to solve such problems as described above, it is an object of the present invention to provide a pattern discriminating apparatus having a higher discriminating performance with respect to three-dimensional image data.
A pattern discriminating apparatus according to an embodiment of the present invention includes: a setting unit configured to set at least one area in a three-dimensional space in a three-dimensional image data; a feature value calculating unit configured to calculate a pixel feature value from one pixel to another of the three-dimensional image data; a matrix calculating unit configured to (1) obtain at least one point on a three-dimensional coordinate in the area which changes in position from a focused point on the three-dimensional coordinate in the area by a specific mapping, and (2) calculate a co-occurrence matrix which expresses the frequency of occurrence of a combination of the pixel feature value of the focused point in the area and the pixel feature values of the mapped respective points, and a discriminating unit configured to discriminate whether or not an object to be detected is imaged in the area on the basis of the combination of the specific mapping and the co-occurrence matrix and a learning sample of the object to be detected which is learned in advance.
According to the present invention, a high discriminating performance with respect to the three-dimensional image data is achieved.
a) and 3(b) are drawings for explaining an example of a stereo image.
a) and 6(b) are drawings for explaining a parallax calculation.
a) and 7(b) are second drawings for explaining the candidate rectangle generating method.
a) and 11(b) are drawings for explaining a co-occurrence histogram in an image in the direction of the brightness gradient.
Referring now to the drawings, a pattern discriminating apparatus 10 according to an embodiment of the present invention will be described.
In this embodiment, it is assumed that two cameras are mounted in front of an automotive vehicle and a pedestrian existing in the direction of travel of the vehicle is detected as shown in
Referring now to
The first input unit 14 and the second input unit 16 input a stereo image shot from different points of view using the two cameras. The relative positions and the directions of the plural cameras are arbitrary as long as the imaging points of view are overlapped with each other. However, in this embodiment, the stereo image is assumed to be shot with the same two cameras arranged on the left and right sides in parallel to each other.
The stereo camera coordinate system is a three-dimensional coordinate, and its original point O is set at a point of view (lens center) of a right camera, a straight line connecting points of view of the left and right cameras is set as an X-axis, a direction perpendicularly downward is set as a Y-axis, and a direction of an optical axis of the camera is set as a Z-axis. A distance between the cameras (base length) is assumed as B. The position of the left camera can be expressed as (−B, 0, 0). For the simplicity, when a road plane is modeled by a flat plane and the inclination in the horizontal direction is ignored as being minute, the plane of the road is expressed as Y=αZ+β in the stereo camera coordinate system. The character α indicates the inclination of the road plane viewed from a stereo camera, and the character β indicates the height from the road surface of the stereo camera. In the following description, α and β are collectively referred to as a road plane parameter. In general, the inclination of the road differs from place to place, and the cameras vibrate when the vehicle is traveling. Therefore, a road surface parameter changes every second in association with the movement of the vehicle.
The image coordinate system is a two-dimensional coordinate and is configured to set for each image. The image coordinate system sets a x-axis in the horizontal direction of the right image, and a y-axis in the vertical direction thereof and, in the same manner, an x′-axis in the horizontal direction of the left image, and a y′-axis in the vertical direction, and the x-axes and the x′-axes in the horizontal direction of the left and right images match the X-axis direction. In such a case, since y=y′ is satisfied where points on the left image corresponding to points (x, y) on the right image are (x′, y′), only the difference in position in the horizontal direction has to be considered. In the following description, the difference in the horizontal direction is referred to as “stereo parallax”, and is expressed as a parallax d=x′−x with the right image as a reference image. The stereo parallax is referred to simply as “parallax” in the following description.
The first storage 18 and the second storage 20 store data of the two stereo images acquired by the first input unit 14 and the second input unit 16.
The setting unit 22 includes a parallax calculating unit 221, a parameter calculating unit 222, and a generating unit 223 as shown in
The parallax calculating unit 221 calculates the parallax d between the stereo images stored in the first storage 18 and the second storage 20.
As shown in
N=(2w+1)2 represents the number of pixels in the windows, f−, g− represent averages of brightness in the windows, σ12, σ22 represent dispersion of the brightness in the respective windows, and these values are given by an expression (2) and an expression (3) given below. For reference, characters f, g with a bar symbol on top are expressed as “f−”, “g−” in the description.
When a point corresponding to an arbitrary point on the reference image is searched using the normalized mutual correlation C, the parallax calculating unit 221 obtains the parallax d for all the points, that is, a parallax map.
The parameter calculating unit 222 calculates a road plane parameter p=(α, β) using the parallax map calculated by the parallax calculating unit 221.
First of all, a method of obtaining a three-dimensional position (X, Y, Z) of a point (x, y) on the reference image from the parallax d of that point by the parameter calculating unit 222 will be described. An expression (4) shown below is established between a point (X, Y, Z) in the three-dimensional space and projection images (x′, y), (x, y) thereof on the left and right images.
When solving the expression (4) for X, Y, X,
is obtained. The parameter calculating unit 222 obtains the three-dimensional position on the reference image from which the parallax d is obtained using the expression (5) shown above. The parameter calculating unit 222 obtains a road plane parameter p=(α, β) by selecting a point at a short distance to the road plane from the measured values, and substituting the selected values to an equation of the road plane Y=αZ+β. The point at a short distance to the road plane is extracted as a point which satisfies the condition of an expression (6) shown below.
[Expression 5]
|Yp−Y|≦ΔY (6)
Here, ΔY is a threshold value and an adequate value is set in advance. Symbol Yp represents a Y-coordinate of an intersection point between a straight line passing through the point (X, Y, Z) and parallel to the Y-axis and the reference road plane. The road plane parameter p of the reference road plane is measured on a flat road when the vehicle is stopped, for example. If the road plane parameters of the reference road plane are assumed to be α̂, β̂, Yp is obtained by an expression (7) shown below. The characters α, β, with a hat symbol ̂ on top are expressed as “α̂”, “β̂” in the description.
[Expression 6]
Y
p
={circumflex over (α)}+{circumflex over (β)} (7)
The generating unit 223 generates an area in which the pedestrian is included according to the following procedure.
The generating unit 223 sets a rectangle having an arbitrary point (x, y) on the parallax map at a midpoint of a lower side thereof as shown in
[Expression 7]
1/Z=(y−α)/β (8)
is obtained, so that the height h and the width w can be calculated from an expression (9) shown below.
In this manner, the size of the rectangle on the image varies depending on the position on the image in the vertical direction, that is, the y-coordinate. Also, in order to accommodate various sizes of the human, the generating unit 223 prepares plural types of the rectangles for the respective points (x, y) on the image as shown in
Subsequently, the generating unit 223 evaluates the probability that a human (pedestrian) is included in a rectangle R as shown in
Therefore, if di is a parallax of an arbitrary point in the rectangle R, the uniformity of the parallax can be evaluated by the number of points N which satisfies
[Expression 10]
|di−dp|≦Δd (11)
In order to take an influence into consideration by the size of the rectangle R, normalization is performed by the surface area S=w×h of the rectangle R and the generating unit 223 registers the rectangle R which satisfies an expression (12) shown below as a candidate area.
[Expression 11]
{circumflex over (N)}=N/S≧N
min (12)
The value of Nmin, is a threshold value and an adequate value is set in advance. As shown in
Subsequently, by the process described above, the generating unit 223 generates N (N≧0) candidate areas R1 to RN.
The setting unit 22 having the parallax calculating unit 221, the parameter calculating unit 222, and the generating unit 223 may set corresponding areas between the stereo images as shown by dotted rectangles in
The normalizing portion 24 normalizes the N areas R1 to RN set by the setting unit 22 to a predetermined size. Although the size of the normalization is arbitrary, the rectangles are set equally to vertically elongated rectangles of 48×24 pixels in this embodiment.
The feature value calculating unit 26 calculates the pixel feature value of the image data in the area normalized by the normalizing portion 24 from pixel to pixel.
As the pixel feature value, for example, a direction of brightness gradient is used. The direction of brightness gradient is a robust pixel feature value with respect to illumination variation or the like, and is the pixel feature value effective even under the environment in which a change in brightness is significant. When the change in brightness is relatively small, the value of the brightness itself may be used as the pixel feature value or the number of tones of the brightness may be reduced. In the following description, a case where the direction of the brightness gradient is used as the pixel feature value will be described.
The feature value calculating unit 26 quantizes the calculated direction of the brightness gradient to a discrete value within an adequate range or, for example, eight directions as shown in
The matrix calculating unit 28 extracts and outputs the co-occurrence feature from one normalized area to another. In the case of this embodiment, the co-occurrence feature is a feature vector λ.
The matrix calculating unit 28 segmentalizes the respective areas normalized by the normalizing unit 24 into meshes as shown in
Since the parallax d is calculated one by one per pixel, data of a certain pixel (x, y) includes only one parallax d. However, a label indicating “undefined” is stored for the undefined three-dimensional position.
A certain focused point (a certain focused pixel) r=(x, y, d) (the center point in
[Expression 12]
h
ij
#{x|I(x)=i,I(x+δ)=j} (13)
Here, a symbol # indicates the number of elements (frequency of occurrence) of an aggregation represented by elements shown in curly brackets. Also, I(x) indicates the pixel feature value at a point x on the image. In other words, the co-occurrence matrix hij of the expression (13) is a two-dimensional histogram showing the distribution in one segment of the meshes relating to the combination of the pixel feature values i, j at two points (focused point and mapping point) defined by the displacement vector δ. For reference, in order to obtain the distribution in the one segment of the meshes, all the points (pixels) on the three-dimensional coordinate in the one segment of the mesh are set in sequence as the focused points r to obtain the co-occurrence matrix hij. Since the co-occurrence matrix hij is defined for one displacement vector δ, if the D types of the displacement vector 6 are used, D two-dimensional histograms are generated. This two-dimensional histogram is the co-occurrence histogram.
b) shows a two-dimensional displacement vector δ at a Chebyshev distance of 1 apart from the focused point r. These two-dimensional vectors are defined for the respective depths. Here, for example, since δ=(0, 1) can be substituted by δ=(1, 0) when the focused point and the mapping point are interchanged, there exist four types δ1 to δ4 of the two-dimensional displacement vectors δ having a Chebyshev distance of 1.
When the depth (parallax d) is broken into five stages before and after the focused point as shown in
In this example, since the total number of meshes is 8×4=32, a 1280×32=40960-dimensional feature vector λ is generated from the one area. The N feature vectors λ obtained by performing the same process for the N areas are output to the discriminating unit 30.
The discriminating unit 30 performs the discriminating process using the N feature vectors λ calculated by the matrix calculating unit 28.
More specifically, a linear function g (λ) is defined as an expression (14) for the feature vector λ and the discriminating unit 30 determines whether or not the substance is the object to be detected (the pedestrian in this embodiment) depending on the magnitude of the output of the function g(λ). For example, when the value of the function g (λ) is larger than the threshold value, the discriminating unit 30 discriminates that the substance is a pedestrian.
g(λ)=wTλ+b (14),
where “w” is a vector having the same number of dimensions as λ and “b” is a constant term. “T” represents the transference of the vector. “w” and “b” are learned in advance using a learning sample of the object to be detected (a pedestrian in this embodiment). As a learning method, for example, a support vector machine may be used.
The output unit 32 outputs the result of the discriminating process performed by the discriminating unit 30 that is, the result of whether or not the substance is a pedestrian.
According to this embodiment, a pattern discrimination with high degree of accuracy can be realized by calculating the pixel feature value reflecting the three-dimensional shape of the substance.
In this embodiment, a case where the pedestrian existing in the direction of travel is detected using the camera installed in front of the vehicle has been described. However, the cameras may be installed on the sides or the rear of the vehicle, or the invention may be applied to a case where the cameras are mounted on other moving bodies other than the vehicle, for example, the robot. The substance on which the cameras are installed is not limited to the moving bodies, and a monitoring camera may be applied.
In this embodiment, the case of detecting a human is described. However, the object to be detected is not limited thereto, and other objects to be detected are also applicable. Also, a case of detecting plural classes of substances such as detecting a human and a vehicle simultaneously is also applicable.
In this embodiment, the stereo view in the case in which the two cameras are arranged on the left and right in parallel is described. However, the number of cameras is arbitrary as long as there are at least two cameras, and the layout of the plural cameras is not limited as long as there are overlapped parts in views.
In this embodiment, a case in which the direction of the brightness gradient is quantized into eight stages as the pixel feature value to be calculated by the feature value calculating unit 26 is described. However, the brightness gradient may be quantized into four directions of upward and downward (2 and 6) and leftward and rightward (0 and 4) viewed as one, or the magnitude may be employed instead of the direction of the brightness gradient. Alternatively, output values of filters such as Gaussian filter, Sobel filter, and Laplacian filter may be used. In addition, plural types of the pixel feature value may be calculated for the respective pixels by combining some of these filters.
In this embodiment, a system of performing a depth restoring process using the stereo cameras as the input portions 16, 18 is described. However, it is also possible to use a shooting apparatus which can input a three-dimensional image directly. As the apparatus described above, there is, for example, three-dimensional voxel data of C.T. devices or M.R.I devices used in a medical field.
In this embodiment, the case where the matrix calculating unit 28 calculates the co-occurrence of the features in the two pixels is described. Generally, however, the co-occurrence characteristic in the N-pixels (N is arbitrary integer numbers of three or more) may be calculated. While the co-occurrence histogram generated by the co-occurrence of the pixel feature value in the two pixels is two-dimensional, the N-dimensional co-occurrence histogram is required for expressing the co-occurrence of the pixel feature value at the N-pixels.
Also, in this embodiment, the case where the matrix calculating unit 28 calculates the co-occurrence of the same type of feature (direction of brightness gradient) is described. However, the co-occurrence of the different image feature values such as the direction of the brightness gradient and the brightness may be calculated. By using the plural image feature values, the accuracy of discrimination may be improved.
Also, in this embodiment, the plural methods of obtaining the mapping point by the parallel movement with the displacement vector of δ=(δx, δy, δd) in the matrix calculating unit 28 are described. However, the invention is not limited thereto, and a rotational movement vector or other vectors which perform the mapping may be used as the displacement vector.
The present invention is not limited to the embodiments shown above as is, and components may be modified and embodied without departing from the scope of the invention in the stage of implementation. Also, various modes of the invention are achieved by combining the plural components disclosed in the embodiments described above as needed. For example, several components may be eliminated from all the components shown in the embodiment. In addition, the components in different embodiments may be combined as needed.
10 . . . pattern discriminating apparatus, 12 . . . feature extracting apparatus, 14 . . . first input unit, 16 . . . second input unit, 18 . . . first storage, 20 . . . second storage, 22 . . . setting unit, 24 . . . normalizing portion, 26 . . . feature value calculating unit, 28 . . . matrix calculating unit, 30 . . . discriminating unit, 32 . . . output unit
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP10/00848 | 2/10/2010 | WO | 00 | 7/11/2012 |