The present invention relates to a feature amount calculation apparatus, and feature amount calculation method and program, that, in particular, sense and detect the position and presence of a target object (hereinafter referred to as “object”) from an image in the computer vision field.
A technology that detects a position of a person shown in an image is expected to be used in a variety of applications, such as video monitoring systems, vehicle driving support systems and automatic annotation systems for images and video, and such technology has been subject to extensive research and development in recent years.
In a scanning frame search type of detection method, an input image is finely raster-scanned using a variable-size rectangular scanning frame, an image feature within the scanned scanning frame is extracted, and it is determined whether or not a target object is shown in the scanning frame using a discriminator that has learned separately offline. Depending on the input image size, the number of scans per image ranges from tens of thousands to hundreds of thousands, and therefore a feature amount and the discriminator's processing computation amount greatly affects the detection processing speed. Consequently, selection of a low-cost feature amount effective for discrimination of a target object is an important factor affecting detection performance, and various feature amounts have been proposed for individual detection target objects, such as faces, people, and vehicles.
Generally, a sliding window method is widely used as an object detection method (see Non-Patent Literature 1 and Patent Literature 1, for example). In a sliding window method, an input image is finely raster-scanned using a rectangular scanning frame (window) of a prescribed size, an image feature is extracted from an image within each scanned window, and it is determined whether or not a person is shown in a target window. Objects of various sizes are detected by enlarging or reducing a window or input image by a predetermined ratio. A feature amount is extracted from each scanned window, and based on an extracted feature amount it is determined whether or not this is a detection target object. The above description refers to a still image, but the situation is similar for moving image processing using feature amounts in preceding and succeeding frames in the time domain, for instance, as in Non-Patent Literature 2.
One important factor affecting detection accuracy is a feature amount used in determining whether or not an object is a person, and various feature amounts have hitherto been proposed. A typical feature amount is a histogram of oriented gradients (hereinafter referred to as “HOG”) feature amount proposed by Dalal et al. in Non-Patent Literature 1. An HOG is a feature amount obtained by dividing a window image of a prescribed size into small areas and creating a histogram of edge direction values within a local area. An HOG captures a silhouette of a person by using edge direction information and has an effect of permitting local geometric changes by extracting a histogram feature for each small area, and shows that excellent detection performance is achieved even for an INRIA data set that includes various attitudes (described in Non-Patent Literature 1).
Patent Literature 1 is proposed as an improvement on the method in Non-Patent Literature 1. In Non-Patent Literature 1, an input window image is divided into small areas of a fixed size and an edge direction histogram is created from each of those small areas, whereas in Patent Literature 1 a method is proposed whereby various feature amounts are provided by making the small area size variable, and furthermore an optimal feature amount combination for discrimination is selected by means of boosting.
There is also Non-Patent Literature 3 as an improvement on the method in Non-Patent Literature 1. In Non-Patent Literature 1, edge directions are quantized into eight or nine directions, and an edge direction histogram is created for each angle. In Non-Patent Literature 3, in addition to an edge direction value of each pixel, co-occurrence histograms of oriented gradients (hereinafter referred to as “coHOG”) features are proposed in which an edge direction combination between two pixels is improved so as also to create a histogram for each 30-offset positional relationship.
An HOG and coHOG both extract a feature amount from an edge image calculated from brightness I of an input image. An edge image comprises edge gradient θ and edge magnitude mag, and is found by means of equations 1 below.
An edge image found in this way is divided into predetermined B small areas, and edge gradient histogram Fb is found for each small area. Elements of gradient histograms of each small area are taken as respective feature dimensions, and multidimensional feature vectors linking all these are taken as a feature amount and F. Edge gradient histogram Fb is shown by equations 2 below.
[2]
F={F
0
,F
1
, . . . ,F
B-1}
F
b
={f
0
,f
1
, . . . , f
D-1
} bε[0,B=1] (Equations 2)
With an HOG, edge gradient values converted to 0 to 180 degrees are divided into nine directions and quantized, and a gradient histogram is calculated with an edge magnitude value as a weight. With a coHOG, edge gradient values of 0 to 360 degrees are divided into eight directions and quantized, and a histogram is calculated for each combination of gradient values of offset pixels of 30 surrounding points with each pixel within a local area as a reference point pixel. With a coHOG, an edge magnitude value is used for edge noise removal, and for pixels for which an edge magnitude value is greater than or equal to a threshold value, a number of events is counted for each gradient direction and for each gradient direction combination.
As shown in
When image data 11 (see
Next, histogram feature configuration section 13 counts pixels included in a local area as a histogram for each edge direction value. Histogram feature configuration section 13 links these edge direction value histograms for each local area to all local areas and creates a feature vector (see
Determination section 15 determines whether or not a feature vector for input image data created in this way is a target object, using discriminant function 14 created beforehand by means of offline learning processing, and outputs the result.
A window image used in human detection generally permits fluctuation according to a person's attitude, and includes not only a person area for using edge data with respect to a background but also a background area (see input image data 11 in
However, the following problems remain to be solved in the conventional methods described in the cited literature.
(1) One problem is that, since edge information is extracted uniformly from within an image, when there are many edges in background pixels, noise is superimposed on a feature amount, and erroneous determination increases. That is to say, in all the conventional literature, noise is included in a feature vector since edge features generated within a window are handled uniformly and feature amounts are acquired uniformly.
Specifically, in the case of an HOG in Non-Patent Literature 1 and Patent Literature 2, edge directions are counted for all pixels within a cell, and therefore both an edge formed by background present in a cell and an edge formed by a person are counted uniformly in a histogram.
As shown in
However, with conventional technology, data of edge directions present in a local area is simply all uniformly counted in a feature vector, and therefore an originally unnecessary edge group arising from background data is counted in a histogram, and this becomes noise (see
The situation is also similar in the case of a coHOG described in Non-Patent Literature 3, and since, in the case of a coHOG, edge direction groups between neighboring pixels are counted, in addition to the above problem, for example, co-occurrences of an edge formed by a body-line of a person and an edge formed by the background are counted equally, and accuracy falls in a similar way due to the influence of noise.
Thus, with conventional technology, a feature amount extracts gradient information of an edge within an image uniformly from pixels within the image, the structure is one in which a feature vector tends to be affected by background pixels, and erroneous determination is prone to occur when there is a complicated background.
(2) Also, an edge feature comprises an edge magnitude and edge direction. With conventional technology, an edge magnitude value is only used as a threshold value for noise removal or as information for weighting edge direction reliability. That is to say, in a place where there is an edge, a feature amount is calculated using a combination of only that direction information uniformly. This problem is made clear by the images shown in
As in Non-Patent Literature 3, the edge magnitude value image shown in
With conventional technology, among the images shown in
Thus, with conventional technology, edge magnitude information has only been used for noise removal, and edge gradient information only has been utilized without regard to magnitude.
The present invention has been implemented taking into account the problems described above, and it is therefore an object of the present invention to provide a feature amount calculation apparatus, and feature amount calculation method and program, that enable an outline of an object such as a silhouette line of a person, and an important feature arising from object outline and surrounding image changes, to be extracted by reducing the influence of background noise.
A feature amount calculation apparatus of the present invention calculates a feature amount of a target object from image data, and is provided with: a feature value calculation section that calculates an edge direction and edge magnitude as input image data pixel-unit feature values; an edge direction group calculation section that combines the edge directions of a plurality of pixels and calculates an edge direction group as an inter-pixel feature amount; a correlation value calculation section that takes all pixels or a predetermined pixel of the plurality of pixels used in the feature value calculation as pixels subject to correlation value calculation, and calculates a correlation value of the edge magnitudes between the pixels subject to correlation value calculation for each feature amount; and a histogram creation section that counts the feature amounts in a histogram for each correlation value, and creates the histogram as a feature vector.
A feature amount calculation method of the present invention calculates a feature amount of a target object from image data, and has: a step of calculating an edge direction and edge magnitude as input image data pixel-unit feature values; a step of combining the edge directions of a plurality of pixels and calculating an edge direction group as an inter-pixel feature amount; a step of taking all pixels or a predetermined pixel of the plurality of pixels used in the feature value calculation as pixels subject to correlation value calculation, and calculating a correlation value of the edge magnitudes between the pixels subject to correlation value calculation for each feature amount; and a step of counting the feature amounts in a histogram for each correlation value, and creating the histogram as a feature vector.
From another viewpoint, the present invention is a program for causing a computer to execute the steps of the above-described feature amount calculation method.
The present invention enables an outline of an object such as a silhouette line of a person, and an important feature arising from object outline and surrounding image changes, to be extracted by reducing the influence of background noise. In particular, the present invention is highly effective in suppressing erroneous determination in which a background image is determined to be a target object.
Now, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
First, terminology used in the embodiments will be explained. A “pixel value” includes a brightness value. “Edge magnitude” is information indicating a degree of change in a pixel value. Edge magnitude is expressed quantitatively by an “edge magnitude value” indicating a pixel value change amount. “Edge direction” indicates an edge gradient, and is a direction in which edge magnitude changes. An edge direction is expressed quantitatively by an “edge direction value” indicating a direction in which a degree of increase in a pixel value is greatest as an angle. An “edge direction group” is a group of edge directions for a plurality of positions in a previously defined specific arrangement relationship. An edge direction group is expressed as a group of edge direction values of each position. A “correlation value” is information quantitatively indicating a degree of edge magnitude correlation at the above plurality of positions, and is a value corresponding to an edge magnitude value change amount. “Edge gradient” has two meanings in this embodiment. The first meaning is edge gradient, as heretofore. The second meaning is an edge direction group and correlation value. “Connected edges” are a group of edges with edge magnitude connectivity. An “edge gradient group” is a collection of pixels with edge gradient (edge direction group and correlation value) connectivity. A “feature value” is information indicating a pixel-unit edge feature, and in this embodiment includes an edge magnitude value and edge direction value. A “feature amount” is information combining feature values, and in this embodiment includes an edge direction group. A “small area” is an image area forming a histogram creation unit, and is also referred to as a “local area” or “small block.”
As shown in
Input to feature amount calculation apparatus 100 is a scanning frame image (image data). Output from feature amount calculation apparatus 100 is a feature vector used in discrimination. It is desirable for a scanning frame image to undergo brightness correction by a brightness correction section (not shown) before being input to feature value calculation section 110.
Feature value calculation section 110 calculates an edge direction and edge magnitude for each pixel from input image data. Here, feature value calculation section 110 calculates an edge magnitude and edge direction for all pixels of an input image. Feature value calculation section 110 may also be referred to as an edge extraction section.
When input image data is provided, feature value calculation section 110 finds an edge direction for each pixel of the image data. For example, if a pixel at coordinates (x, y) is denoted by I (x, y), edge direction θ can be found by means of equations 3 and 4 below. Equations 3 are the same as equations 1 given earlier.
[3]
d
x(x,y)=I(x+1,y)−I(x−1,y)
d
y(x,y)=I(x,y+1)−I(x,y−1) (Equations 3)
When equations 3 and 4 are used, θ is found as a number of degrees between 0 and 360. Here, the number of degrees may be divided by Q, and values quantized into Q directions may be used.
With regard to feature values of each pixel, a group of values of above edge direction θ of a plurality of pixels in arrangement relationships in previously defined N specific space-times are taken as feature values. An above space-time means a three-dimensional space comprising two-dimensional space (x, y) in an image and time domain t, and is decided uniquely by intra-image position (x, y) and time-domain value (t). An arrangement relationship in space-time can be defined by means of distances (ddx, ddy, ddt) or the like, such as nearby pixels within three-dimensional space-time with respect to a certain target pixel (x, y, t) in an image.
In feature value calculation, it is the same even if points of two or more pixels in space-time are used. Here, a description is given by way of example of a case in which two points are used.
Above, edge direction group (θv1, θv2) is calculated for each of pixels v1 and v2 in a previously defined specific arrangement relationship.
As shown in
Feature amount calculation section 120 performs the processing in
Edge direction group calculation section 121 and correlation value calculation section 122 perform the processing in
Correlation value calculation section 122 operates closely coupled with edge direction group calculation section 121.
Correlation value calculation section 122 calculates edge magnitude values mv1 and mv2 for each pixel by means of equation 5 below, for example, from pixel values for pixels v1 and v2 used by the above-described feature value calculation section when calculating feature values (θv1, θv2).
[4]
m
v=√{square root over (dx(x,y)2+dy(x,y)2)}{square root over (dx(x,y)2+dy(x,y)2)} (Equation 5)
Feature amount calculation section 120 calculates correlation value Cv1,v2 by means of equation 6 below, based on an edge magnitude value.
[5]
C
v1,v2
=G(mv1−mv2) (Equation 6)
Above G(x) is a function for multiplying a gradient by the size of an edge magnitude difference value, and G(x)=x may be used, or G(x) may be calculated by means of equation 7 below, using threshold value α.
[6]
G(x)=k, if αk≦x<αk+1 kε[0,1,2, . . . T−1] (Equation 7)
The form of the G(x) equation is not restricted, but here it is assumed that T-stage correlation values having values of 0 to T−1 are output as C.
Returning to
To histogram creation section 130 (θv1, θv2, Cv1,v2) comprising edge direction information and a corresponding correlation value is input in a quantity (N) equivalent to a predetermined number of feature values as output of correlation value calculation section 122 of feature amount calculation section 120.
Here, θv1 and θv2 can have Q values from 0 to Q−1 based on respective edge direction quantization value Q. Cv1,v2 assumes T values from 0 to T−1. Thus, a histogram is prepared in which (θv1, θv2, Cv1,v2) value group Q*Q*T elements are assigned to each bin of the histogram.
The number of pixels having a (θv1, θv2, Cv1,v2) feature value is counted in a histogram from pixels present in a local area of input image data, and a Q*Q*T-dimensional feature vector with a value of each bin as one feature vector dimension is generated.
Histogram connection section 140 connects feature vectors of all blocks.
Thus, feature value calculation section 110 of feature amount calculation apparatus 100 divides input image data into previously specified small blocks, and calculates an edge magnitude value (a real number between 0.0 and 1.0) and an edge direction value as feature values for each small area. Feature amount calculation section 120 calculates inter-pixel edge direction values and correlation value (θv1, θv2, Cv1,v2) for N predetermined pixels. Histogram creation section 130 counts feature values in a histogram that sets a bin for each feature value (θv1, θv2, Cv1,v2), and performs calculation for each small area. Histogram connection section 140 connects these and outputs them as a feature vector.
An object detection apparatus can be implemented by combining above feature amount calculation apparatus 100 with discriminant function 14 and determination section 15 in
This determination section uses a discriminant function constructed beforehand by means of offline learning processing, determines whether or not an input feature vector is a target, and here outputs whether or not the object is a person.
That is to say, feature amount calculation apparatus 100 finds an edge magnitude correlation value, and creates a histogram by counting an edge direction group for each edge magnitude correlation value. By this means, feature amount calculation apparatus 100 can acquire a feature amount (histogram) indicating not only edge gradient correlation (connectivity) but also edge magnitude correlation (connectivity). That is to say, feature amount calculation apparatus 100 can extract an edge gradient group that characteristically appears in an object outline such as a person's silhouette from an image. Therefore, feature amount calculation apparatus 100 can reduce the influence of background noise and calculate a feature amount of an outline of an object such as a person.
The operation of feature amount calculation apparatus 100 configured as described above will now be explained.
First, the basic concept of the present invention will be explained.
In a feature amount calculation method of the present invention, when a feature vector is constructed by creating a histogram of feature values calculated using information of a plurality of pixels, a function is provided that determines correlation (including similarity) between pixels used in feature amount calculation, and a feature value histogram is constructed for each inter-pixel correlation value.
More specifically, information such as an edge magnitude value and input image pixel value is used to determine whether or not inter-pixel correlation is high, and an inter-pixel edge direction histogram feature having a series of edges represented by a silhouette line showing an object shape is extracted. In particular, an edge magnitude value is not binarized for noise removal as in conventional technology, but a real number is used.
With the present invention, when attention is focused on a small area, since there is little change in either a collection of detection target object pixels or a collection of adjacent background area pixels, with a pixel group forming a series of edges appearing at a boundary, attention is focused on the possibility of a pixel value and edge magnitude value having close values. The fact that an actual pixel value varies according to an object but inter-pixel correlation is similar between neighboring pixels of the same object is utilized. A silhouette line of an object line can be captured accurately by utilizing this feature. “Accurately” means capturing only a feature formed by an edge from the same object, and making it easier to capture a series of edges.
In addition, not only inter-pixel information with high correlation, but also feature values for each correlation value, are handled.
With an edge formed by a background and an edge formed by a 3D object, there appears in a silhouette line of a person or leg in person image, for example, an amplitude of an edge magnitude value given by a 3D form such as illustrated in
As shown in
Focusing attention on an edge magnitude image, there is a feature specific to a person image. An edge magnitude value has a predetermined maximum value at a boundary between a background area and a person area, and there is a hill-shaped amplitude shape in a direction perpendicular to a silhouette line (see
Utilizing this feature, an edge gradient between pixels on a contour line for which edge magnitudes have similar values is extracted. By this means, pixel connectivity is captured, and a person's silhouette can be extracted stably.
In addition, an inter-pixel edge feature (edge gradient) that straddles an edge magnitude contour line is extracted. Since a person has a three-dimensional shape, a feature (edge gradient group) appearing with edges for which a brightness value changes smoothly given the roundness of a person forming a group on the inner side (person area side) of a strong edge occurring at a boundary between a person and background is extracted.
That is to say, since an image area that includes many edge gradient groups has a high possibility of including an object outline, the present invention enables an object to be extracted stably.
A feature amount that takes the above into consideration is shown in equation form below. As stated above, for a feature amount of the present invention, two or more pixels are used to determine edge connectivity using edge magnitude correlation, and that inter-pixel correlation and inter-pixel edge information (edge gradient) are extracted. For simplicity, equations 8 below are for a case in which an edge gradient group and edge magnitude correlation between two pixels are used.
In above equations 8, it is assumed that bε[0, B−1] and dε[0, D−1]. Feature vector F comprises B blocks, and connects D-dimensional edge gradient histogram features in each block. Also, Q( ) and r( ) indicate quantization functions, and number of dimensions D is decided based on an edge gradient direction quantization number and edge magnitude quantization number. Q and Q′ in equations 8 may be the same or different.
In each block, correlation between intra-block pixel (xi, yi) and neighboring pixel (xj, yj) is calculated based on an edge magnitude difference, and edge gradient value θ and relative angle dθ are counted in histogram features as a pair.
A comparative description of this embodiment and a conventional method will now be given.
With a conventional method, only the edge direction image in
In contrast, in this embodiment, both the edge magnitude image in
A comparative description will now be given of a feature vector of this embodiment and of a conventional method.
As shown in
As shown in
In contrast, as shown in
That is to say, whereas noise is superimposed in a conventional method (see
Feature value calculation section 110 (see
An edge magnitude in
The operation of feature amount calculation apparatus 100 will now be described.
a is image data input to feature value calculation section 110 of feature amount calculation apparatus 100 (see
e is a correlation value calculated by feature value calculation section 110, and
In feature value calculation section 110, input image data is divided into blocks. The unit of division is called a small block (small area).
Feature value calculation section 110 calculates an edge magnitude and edge direction (edge gradient) for the entirety (all pixels) of input image data.
Feature amount calculation section 120 combines feature values of a plurality of pixels of input image data and calculates an inter-pixel feature amount. To be more precise, feature amount calculation section 120 takes all pixels or a predetermined pixel of the plurality of pixels as pixels subject to correlation value calculation, and calculates a correlation value between the pixels subject to correlation value calculation.
Histogram creation section 130 performs division into the above small blocks (where a plurality of pixels are included in a small block), and creates a histogram for each divided small block.
Here, a series of edges are captured using edge magnitude values of local area k in
A range indicated by a feature vector in local area k in
Histograms are created on a per small block basis, for all small blocks, histograms are integrated, and a histogram drawing is finally created (see
When creating histograms for each small block, the first pixel (calculation−start pixel=pixel of interest) of a coHOG (method using two pixels) is each pixel included in the relevant small block. As the second pixel (nearby edge), a pixel outside a small block is also applicable.
As shown in
In step S2, feature value calculation section 110 calculates edge direction θ and edge magnitude m for each pixel in accordance with above equations 4 and 5.
In step S3, feature amount calculation section 120 performs processing for each small block. At this time, feature amount calculation section 120 also performs division into small blocks.
In step S4, histogram connection section 140 connects feature vectors of all blocks, outputs a scan image data feature vector, and terminates this processing flow.
In step S12, histogram creation section 130 counts a calculated edge direction group and correlation value (edge gradient) in a histogram, and returns to above step S11. In this way, feature amount calculation section 120 is repeatedly involved in processing for each pixel in a block within the dotted-line frame in
As shown in
[Implementation Example]
In this embodiment, the way in which a histogram dimension is defined is arbitrary. Therefore, application and adaptation to various feature values are possible taking two or more pixels into consideration as with a conventional method coHOG or LBP. For comparison, Non-Patent Literature 1 (Dalal's HOG) and a conventional method coHOG are compared, and the efficacy of this embodiment is verified.
An INRIA data set often used in human detection algorithm evaluation, proposed in Non-Patent Literature 1, was used as a database used in the experiment. Also, 2,416 person images and 1,218 background images not showing persons, were prepared as learning images. Rectangular images of ten places randomly clipped from the prepared 1,218 background images are used as background samples in accordance with information from INIRA website: http://pascal.inrialpes.fr/data/human/
Chart 1 in
In this ROC curve, as a result of comparison with a coHOG method using similar learning data and detection data, in the case of false positive rates 1e-4 and 1e-5, a 2 to 4% improvement in performance has been confirmed, and efficacy has been confirmed.
Here, in this embodiment, a calculation equation has been given that finds a correlation value based on an edge magnitude difference between a plurality of pixels in correlation value calculation, but in addition to an edge magnitude difference, calculation may also be performed using a pixel value difference and space-time distance, as in equation 9 below.
[8]
C
v1,v2
=α*G
1(mv1−mv2)+β*G2(Iv1−Iv2)γ*G3(dist(v1,v2) (Equations 9)
In above equation 9, α, β, and β indicate real numbers between 0.0 and 1.0 and are constants representing weights of each term. Also, represents a pixel value for pixel v. Furthermore, dist( ) indicates a function that returns an inter-pixel distance value, and may be found by means of a Euclidian distance or the like. Each G may be a method given in above equation 7.
If input is a color image, edge directions and edge magnitudes may be calculated by means of equations 10 through 13 below, using values of three elements of input color data.
[9]
d
Rx(x,y)=IR(x+1,y)−IR(x−1,y)
d
Ry(x,y)=IR(x,y+1)−IR(x,y−1)
d
Gx(x,y)=IG(x+1,y)−IG(x−1,y)
d
Gy(x,y)=IG(x,y+1)−IG(x,y−1)
d
Bx(x,y)=IB(x+1,y)−IB(x−1,y)
d
By(x,y)=IB(x,y+1)−IB(x,y−1) (Equations 10)
m
Rv=√{square root over (dRx(x,y)2+dRy(x,y)2)}{square root over (dRx(x,y)2+dRy(x,y)2)}
m
Gv=√{square root over (dGx(x,y)2+dGy(x,y)2)}{square root over (dGx(x,y)2+dGy(x,y)2)}
m
Bv=√{square root over (dBx(x,y)2+dBy(x,y)2)}{square root over (dBx(x,y)2+dBy(x,y)2)} (Equations 11)
m
v
=m
Rv, MaxColId=R, ifmRv=max(mRv,mGv,mBv)
m
v
=m
Gv, MaxColId=G, ifmGv=max(mRv,mGv,mBv)
m
v
=m
Bv, MaxColId=B, ifmBv=max(mRv,mGv,mBv) (Equations 12)
Subscripts R, G, and B assigned to variables in above equations 10 through 13 indicate a case in which an input color image is an image having three elements RGB, but a different color space may also be used, such as YCbCr.
In such a case, presence or absence of correlation may be determined as shown in equation 14 below according to whether or not a MaxColId value (R or G or B) used in edge magnitude and edge direction calculation has the same value.
Edge direction and edge correlation value calculation is performed for Y, Cb, and Cr channels in color space YCbCr and feature vectors are calculated for each, and when all are used, the performance indicated by “x” symbols in
Performance can be significantly improved by performing feature value calculation in a YCbCr space in this way.
As described in detail above, according to this embodiment, feature amount calculation apparatus 100 is provided with feature value calculation section 110 that calculates an input image data pixel-unit feature value, feature amount calculation section 120 that combines the feature values of a plurality of pixels and calculates an inter-pixel feature amount, histogram creation section 130 that counts the feature values for each correlation value of pixels used in the feature value calculation, and creates the histogram as a feature vector, and histogram connection section 140 that connects feature vectors of all blocks. Also, feature amount calculation section 120 is provided with edge direction group calculation section 121 that calculates a group of edge directions, and correlation value calculation section 122 that takes all pixels or a predetermined pixel among the plurality of pixels as pixels subject to correlation value calculation and calculates a correlation value between the pixels subject to correlation value calculation.
According to such a configuration, inter-pixel correlation and connectivity are captured by utilizing a feature value and an inter-pixel correlation value whereby that feature value is calculated as feature information, and taking correlation into account. Capturing inter-pixel correlation and connectivity—that is, feature extraction for a pixel group (edge gradient group) with connectivity such as a silhouette shape of an object—becomes possible, noise due to a background can be suppressed, and object detection accuracy can be improved. Thus, feature vector extraction can be performed taking inter-pixel linkage and connectivity into consideration, and a feature amount that improves object detection accuracy can be generated.
It is also possible to represent an inter-pixel edge direction group as (θv1, dθv2-v1) using a relative angle rather than (θv1, θv2).
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in
As shown in example 1 in
However, as shown in example 2 in
Embodiment 2 is an example of a case in which a feature type that constructs a histogram feature is not an edge direction value, but a feature such as an LBP (Local Binary Pattern) feature is used.
As shown in
Feature amount calculation section 220 calculates an LBP feature from input image data, and combines the LBP features of a plurality of pixels and calculates an inter-pixel feature amount.
As shown in
LBP feature amount calculation section 221 calculates an LBP feature amount.
Of a plurality of pixels used in the LBP feature amount calculation, correlation value calculation section 222 makes a pixel group in which 0 and 1 are reversed in an LBP bit string, or a pixel group in which 0 and 1 are reversed and a center pixel, pixels subject to correlation value calculation, and calculates correlation between pixels subject to correlation value calculation.
Feature amount calculation section 220 performs the processing in
The operation of feature amount calculation apparatus 200 configured as described above will now be explained.
As shown in
With a conventional LBP feature, to what extent a bit string sequence such as shown in
As shown in
In step S4, histogram connection section 140 connects feature vectors of all blocks, outputs a scan image data feature vector, and terminates this processing flow.
In step S12, histogram creation section 130 counts a calculated edge direction group and correlation value in a histogram, and returns to above step S11. In this way, feature amount calculation section 220 is repeatedly involved in processing for each pixel in a block within the dotted-line frame in
According to this embodiment, feature amount calculation apparatus 200 is provided with feature amount calculation section 220 that calculates an LBP feature from input image data, and calculates a new feature amount taking account of correlation of a plurality of pixels referenced when performing LBP feature calculation. In feature amount calculation section 220, LBP feature amount calculation section 221 calculates an LBP feature amount, and of a plurality of pixels used in LBP feature amount calculation, correlation value calculation section 222 makes a pixel group in which 0 and 1 are reversed in an LBP bit string, or a pixel group in which 0 and 1 are reversed and a center pixel, pixels subject to correlation value calculation, and calculates correlation between pixels subject to correlation value calculation.
In this embodiment, by using edge magnitude and pixel value correlation of pixels in which 0s and 1s of an LBP bit string are reversed or pixels in which 0s and 1s are reversed and a center pixel, the same kind of effect is achieved as in Embodiment 1—that is, an inter-pixel relationship and connectivity are captured by utilizing a feature value and an inter-pixel correlation value whereby that feature value is calculated as feature information, and taking account of correlation. Inter-pixel correlation and connectivity can be captured. Therefore, feature extraction for a pixel group (edge gradient group) with connectivity, such as a silhouette shape of an object, becomes possible, noise due to the background can be suppressed, and object detection accuracy can be improved. Thus, feature vector extraction can be performed taking inter-pixel linkage and connectivity into consideration, and a feature amount that improves object detection accuracy can be generated.
LBP feature amount calculation section 221 may, of course, also perform feature amount calculation for each channel of a YCbCr space in the same way as in Embodiment 1.
The above description presents examples of preferred embodiments of the present invention, but the scope of the present invention is not limited to these. The present invention can be applied to any kind of apparatus as long as it is an electronic device having a feature amount calculation apparatus that calculates a feature amount of a target object from image data.
A feature amount calculation apparatus and method of the present invention have, as a first feature and effect, capturing an edge series and edge amplitude relationship utilizing inter-pixel correlation. As a feature value, an edge direction based histogram may be used, or a histogram feature based on something like LBP or suchlike pixel value gradient information may be used. The following is LBP-related reference literature.
In the above embodiments, the term “feature amount calculation apparatus” has been used, but this is simply for convenience of description, and terms such as “object detection apparatus” and “object detection method” or the like may also be used for an apparatus and method respectively.
An above-described feature amount calculation apparatus is also implemented by means of a program for causing this feature amount calculation method to function. This program is stored in a computer-readable storage medium.
The disclosure of Japanese Patent Application No. 2010-65246, filed on Mar. 19, 2010, including the specification, drawings and abstract, is incorporated herein by reference in its entirety.
A feature amount calculation apparatus and feature amount calculation method according to the present invention are effective in discriminating a target object with a high degree of accuracy, and are suitable for use in an object detection apparatus or object tracking apparatus using image features or the like. Possible uses include video monitoring systems when a detection object is a person, animal, or the like, vehicle driving support systems, automatic annotation systems for images and video, and so forth.
Number | Date | Country | Kind |
---|---|---|---|
2010-065246 | Mar 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/001576 | 3/17/2011 | WO | 00 | 11/2/2011 |