Not Applicable
Not Applicable
Not Applicable
A portion of the material in this patent document is subject to copyright protection under the copyright laws of the United States and of other countries. The owner of the copyright rights has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the United States Patent and Trademark Office publicly available file or records, but otherwise reserves all copyright rights whatsoever. The copyright owner does not hereby waive any of its rights to have this patent document maintained in secrecy, including without limitation its rights pursuant to 37 C.F.R. §1.14.
1. Field of the Invention
This invention pertains generally to 3D computational geometry/graphics. Specifically, the invention provides a method of image characterization, and more particularly to detecting vanishing points, vanishing direction and road width in a 2D image.
2. Description of Related Art
Due to the large variances between images, it is hard to identify all the parameters for every image, even by human beings. Typically, the vanishing point is defined as the perspective projections of any set of parallel lines that are not parallel to the projection plane. Various methods have been proposed for vanishing point determination, such as, those involving a support vector machine (SVM) algorithm, but owing to the complexity of training images on a neural network, such methods becomes computationally costly. Further, some algorithms are based on Random Sample Consensus (RANSAC) to determine the vanishing point. A RANSAC algorithm finds the best subsets of edges or supporting edges where all the supporting edges finally converge at a vanishing point. The weakness of the RANSAC method is that if more than one vanishing points are to be determined, the number of iterations to detect a vanishing point increases.
An aspect of the present invention is a 3D computational graphical model that uses an edge scoring algorithm. The method of the present invention involves scoring each edge of an image via several properties such as, the edge length, the possibility it belongs to the vertical/horizontal plane boundaries, and the probability it supports a VP with other edges. The method of the present invention is computationally very cheap and effective in terms of determining vanishing point, vanishing direction and width of a road in a 2D image.
In one embodiment, a vanishing point can be detected by computing a set of parameters from the 2D image, and the angle corresponding to the vanishing point.
One aspect of the present invention is a method for detecting vanishing points, vanishing direction and road width in an input 2D image by identifying whether or not the input image comprises regular patterns and usable edges for vanishing point analysis. Preferably, scene identification of reliable images is performed with a rule-based method without use of training data. In one embodiment, identification of reliable images is based on the entropy of the histogram of oriented gradients (HoG) and the ratio of short edges. In one embodiment, identification of reliable images may be used for detection of man-made scenes or regular patterns.
In one embodiment, the vanishing point can be detected by computing a set of parameters from the 2D image, and the angle corresponding to the vanishing point.
Another aspect is a method for detecting the vanishing point from input images by utilizing an edge scoring method, wherein the edge scoring method includes determining a set of edges that occur a maximum number of times in the direction of the vanishing point.
In one embodiment, the edges are scored according to several properties such as, the edge length, the possibility it belongs to the vertical/horizontal plane boundaries, and the probability it supports a VP with other edges. In one embodiment, the edge score is computed using a calculated histogram of oriented gradients (HoG).
In one embodiment, the detected VP is used for computing a depth map, calibrating the direction of a camera, or classifying the different planes in the image.
Another aspect is a method to estimate computer graphic model (CGM) parameters from a single 2D Image, including the vanishing point, the vanishing direction, and the width of the ground plane. The method comprises three primary parts, 1) fast scene identification to identify if the composition of an image is appropriate for vanishing point analysis, 2) vanishing point detection based on a novel edge-scoring method, and (3) vanishing direction and road width estimation (VDE/RWE) based on a plane classification method.
To accelerate the computation of the three parts, the methods of each component were configured not only to improve the performance itself, but to facilitate the computation of other components. The three primary components are configured to execute as a whole, or have the flexibility to execute independently for purposes other than CGM.
Another aspect of the present invention is estimation of the ground plane in an image without façade analysis. In one embodiment, estimation of the ground plane is performed via a segment-based method. In another embodiment, the analysis of the supporting edges of detected vanishing points is used to obtain a small number of plane boundary candidates. The method comprises a semi-supervised analysis (no training models) to identify plane boundaries with each plane initialized by a few segments only.
A further aspect is a method for estimating the vanishing direction from center of the straight road to the vanishing point. Another aspect is a method for estimating the road width of the straight road based on a plane identification method. In one embodiment, the vanishing direction and road width are estimated using two lines originated from the vanishing point and spanning the ground plane. Preferably, the two lines are computed in constant time.
The calculated vanishing direction and road width may be used for image-based guiding or surveillance systems.
In one embodiment, a set of parameters comprising four scalars are calculated to generate an ellipsoid CGM for 3-D walkthrough simulation.
The systems and methods of the present invention can be integrated with software executable on computation devices such as computers, cameras, video recorders, mobile phones, or media players to quickly generate 3D environments from 2D scenes. The systems and methods may be used for computer graphics production, movie production, gaming, VR touring, digital album viewers. With use in conjunction with GPS data, the systems and methods may be used with image-based guidance systems such as vehicle auto-guidance or mobile robots.
In a preferred embodiment, the VP detection method is used with an image-capturing device such as a camera.
Further aspects of the invention will be brought out in the following portions of the specification, wherein the detailed description is for the purpose of fully disclosing preferred embodiments of the invention without placing limitations thereon.
The invention will be more fully understood by reference to the following drawings which are for illustrative purposes only:
First, at block 12, scene identification (SI) is performed to identify if the input image is comprised of some regular patterns and usable edges for later processing steps.
Next, at block 14, vanishing point detection (VPD) is performed to detect the vanishing point(s) of the image. If the VPs are detected, the estimation of other parameters such as the direction for walk-through (vanishing direction, VD) and the width of the road to walk-through (RW) can be derived accordingly.
Accordingly, at block 16 vanishing direction estimation (VDE) is performed using the detected vanishing points to estimate the direction from the center of the straight road to the vanishing point. At block 18, road width estimation (RWE) is performed to estimate the width of the straight road which has both sides attached to vertical structures.
It is appreciated that although a particular objective of the present invention is to provide VP, VD, and RW for Computer Graphic Model (CGM) parameters, the systems and methods described herein may also be used in applications other than CGM. Many of the methods in the present invention are independent from CGM.
I. Scene Identification (SI)
Contents in images can be very different from one to another. For example, an image with many long straight lines due to man-made structures, which may help in analyze the sky, ground, and buildings, may be significantly different from an image having irregular, short edges due to both the trees and the water. Most image processing or computer vision algorithms, depending on the image clues they chose, would therefore have limitations on the type of images giving the best performance.
The vanishing point detection module 14 of the present invention is based primarily on edge or straight line information. Because a wrong result can be more detrimental than a missed detection, the scene identification module of the present invention acts to remove the images with unreliable clues, instead of searching an “all-robust” algorithm for all cases. Thus, it avoids risky images and focuses on those that are good for analyzing. The responded results can have fewer error rates and more appeal to users.
First, categories of images that are hard/easy to process are defined. As mentioned above, the use of different image clues can lead to different selections of good images, and thus it is desirable to have good images having salient edges or lines. The images with more regular edges, which are mostly caused by man-made structures such as buildings, markers, fences, or desks, are generally easier to solve. On the other hand, the edges caused by trees, snow, or water are usually less informative and time consuming to process. Accordingly, an object is to identify the scene categories of useful images and excluded the inappropriate images from further processing.
As scene identification 12 is an auxiliary task for vanishing point detection 14, the preferred method is simple and fast to save time. The two categories: one can be processed by VPD and the other cannot. Most of the accepted images (category of images that can be processed by VPD) are man-made and most of the removed ones (category of images that cannot be processed by VPD) are natural-like. Therefore, though the problem is not fully identical to man-made/natural scene identification, the terms ‘Man-made’ and “Natural” are used to denote the two categories.
Magnitude=(gx2+gy2)1/2
Orientation=atan (gx/gy)
Next, at step 56, the image is divided into blocks in the x- and the y-directions, and a histogram is calculated for each block at step 58.
Beside the single division of blocks, one can expand this to multi-level of divisions such as 1×1, 2×2, 4×4, 8×8 . . . . This forms the so-called hierarchical HoG (H) that everything is again cascaded together:
H=[h1,h2,h2—1 . . . h2—4,h3—1 . . . h3—16 . . . ]T
where h1 is the 18-bin histogram (an 18×1 vector) of the whole image 64. h2 has its number of blocks in each direction=2, with a total of four blocks indexed from h2—1 to h2—4, and then h3, h4 . . . . Hence, H also contains the magnitudes of orientations in different scales which can represent more informative gradient/edge distribution of the image. In a preferred method, we use the hierarchical HoG from 1×1 to 8×8 blocks.
As mentioned above, the objective module 12 is to include the images with strong/reliable edge information and remove the less informative ones from following processes. The following two measures were developed:
The computation of block 42 is to check if there are sufficient blocks with dominant orientations at any HoG level. We use the four level HoG to cover the different scales of image divisions: 1×1, 2×2, 4×4, and 8×8. The orientations are defined from 0 to 180 degrees and divided into 18-bins. To check the existence of dominant orientations of a block, we compute the entropy of its HoG. For the HoG of some block i, all bin values are normalized by the sum of the 18 bin values such that it becomes a probability value between 0˜1. The HoG normalized by the L2-norm in can have the sum of bin values>1.0. The normalization here is to make the bin values become probabilities that the sum of them=1.0. We use the classic definition of entropy so E(i)=−k=118H(i,k)logH(i,k), where the kth bin of block i is H(i,k) and the entropy value is E(i). A block having its entropy value larger than Th_H1 is considered as a Natural block because it cannot show any dominant orientation. For some HoG block division level n, we further calculate the ratio of Natural blocks to the total number of blocks (P(n)). For example, if we have 3 natural blocks at level 2, and the total number of blocks of level 2 is four, we obtain P(2)=¾. Another threshold, Th_H2, is defined to make decisions on P(1),P(2),P(3),and P(4).
Unlike block 42 using HoG, block 44 only depends on the lengths of edges, and rejects the images with many irregular, short edges. Th_E1 is introduced for edge lengths, so the edges shorter than Th_E1 are defined as the short edges. This gives us a ratio R=(# short edges)/(# total edges) and a predefined Th_E2 is used to threshold R for a decision. To summarize, an image is considered as ‘Man-made’ if it satisfies both:
(1) Any of P(1),P(2),P(3),P(4)<Th_H2 so we can observe sufficient blocks with regular orientations at some scale(s); and
(2) R<Th_E2 so there are sufficient number of long (maybe reliable) edges. Otherwise, it is considered as “Natural.”
To prepare for VPD 14, more information may be derived from HoG 40, since it contains abundant spatial characteristics of an image. Instead of considering the multiple levels of block divisions in our HoG, only the fourth level of blocks, 8×8 blocks, will be used here.
Two more measures that may be derived from the calculated HoG 4th level blocks are co-occurrence directions 46 and changing blocks of the image at block 48.
The found pairs of co-occurrence directions, both from left to right and top to bottom, are accumulated in a co-occurrence table. We use the same settings for HoG so that the 0˜180 degrees are quantized into 18 bins so the co-occurrence table is 18×18. For each found pair (θ, φ), φ that comes from block j, it will accumulate the table at entry (θ, φ) by the HoG bin value corresponding to φ from block j. Finally, if an entry (θ, φ) is above a threshold, we take both directions as the co-occurrence directions and will emphasize the edges in the co-occurrence directions for VPD.
The other derived information, changing blocks 48, is designed to detect the blocks containing plane transitions.
Diff(i)=Σblock j connected to block iΣk=118abs(H(i,k)−H(j,k)), Eq. 1
where H(i, k) is the HoG bin value only normalized by the L2-norm so different from H(i, k).
In the method of the present invention, the difference of neighboring blocks is a bit more complicated. All HoG bins arranged such that the bins are corresponding to edge directions instead of gradient directions. The following vertical and horizontal filters are separately applied to the rearranged HoGs of 3×3 blocks where the block to check, i, locates at the center:
v_filter=[−1,−2,−1;0,0,0;1,2,1],h_filter=v_filterT.
Two vectors of 18 directional differences Dh(I, 1 . . . 18) and Dv(I, 1 . . . 18) can be obtained by applying the corresponding filters. We then summarize the 18 bins of differences to a single value D(i) by the weighted norm:
D(i)=Σk=118((Wh(k)*Dh(i,k))2+(Wv(k)*Dv(i,k))2)1/2 Eq. 2
where Wh, Wv are two weighting vectors corresponding to the 18 directions.
Wh and Wv are designed to emphasize the discontinuity of edges parallel to the differential directions. For example, 0° and 180° would be emphasized after the horizontal block filtering, while 98° would be emphasized after the vertical block filtering. The current assignments of Wh and Wv are:
II. Vanishing Point Detection (VPD)
In a 2D image, one can observe that the originally parallel lines in 3D show convergence. The points where these lines converge are called the “vanishing points” (VPs). One image can have multiple vanishing points, and a vanishing point can be either inside or outside the image.
For use with CGM, the target vanishing points are those inside the 2D image, and the two sides of the road to walk-through converge to one of them.
To find the vanishing points, the key is to choose the clues (usually the edges) from an image that really relate to the VPs. The most popular existing method is to use the “RANdom SAmple Consensus” (RANSAC) algorithm for finding the best subset of edges supporting the best VP. We call the subset of edges the “supporting edges” for this VP because all the extension lines from these edges can converge to this VP; on the contrary, other edges in the image are called the “outliers.” If multiple VPs are expected, one can keep feeding the outliers respective to the found VP(s) to RANSAC for finding more VPs. Hence, one needs to set the number of VPs, i.e., the maximum iterations to run RANSAC. Another popular method is to use “Expectation-Maximization” (EM) for grouping the edges and finding the best set of VPs, which is also based on a known number of VPs for an image. EM is also a good method to refine the positions of the VPs found by other methods.
The present invention applies a modified J-Linkage method for performing vanishing point estimation step 14. While other methods may be implemented, e.g. Expectation-Maximization (EM), or RANdom SAmple Consensus” (RANSAC), J-Linkage was chosen based on two reasons: (1) J-Linkage can jointly decide all VPs in one pass. It is not an iterative method like RANSAC or EM; and (2) J-Linkage performs like an unsupervised clustering so no predefined number of VPs is required. Therefore, J-linkage is a method which can be fast and less restricted. However, the pure J-Linkage method still has its drawbacks for practical applications. In the following discussion, we first present the pure J-Linkage algorithm and address these problems.
To start from the pure J-Linkage method for VP detection, the underlying data for calculating the VPs are again the edges. Canny edge detection is first used to extract the edges. The raw edges are then preprocessed such that the edges on the same straight lines are linked first and then the intersections between edges are removed. The resultant edges are all straight and have their lengths, directions, and positions recorded. According to the definition of VPs shown in the schematic diagram of
We let E denote the set of N edges, and V denote the set of M hypotheses of VPs (random guesses of VPs) by the edges in E. Usually M is tens to hundreds. J-Linkage first computes the “Preference Matrix” as shown in
Although not all of the M VPs are true, the true VPs should have higher probability to be put in V, since they are the common intersections of many edges. In addition, the edges supporting the same VP should have similar ‘preference’ of VPs represented by their characteristic functions of the preference set. Hence, we can use these rows (characteristic functions) as a type of feature and group the similar edges together.
The grouping (or clustering) requires a distance metric such that we can calculate the similarity between two data. In J-Linkage, the data are the characteristic functions representing the edges; they contain binary values and are appropriate for a point-set based distance metric. Jaccard distance is such a metric which gives the following definition:
where a, b are two binary vectors of length M in our case. ∪ and ∩ are the ‘OR’ and ‘AND’ binary operators so the results are also vectors of length M. |.| is an operator to count the ‘1’ elements from the resultant vector. Jaccard distance is a true distance and between 0˜1.
We performed a bottom-up grouping that each edge itself is a cluster at the beginning. If two edges (their characteristic functions) are similar enough according to Eq. 4 (small d_j), the two edges are merged as the same cluster. Further merging will need the definition of distance between two clusters containing multiple data. It is defined as:
d
j(A,B)=minaεA,bεBdj(a,b), Eq. 5
where A, B are two data clusters.
The pure J-Linkage algorithm discussed above can jointly cluster the edges and obtain the corresponding VPs. As previously mentioned, however, J-Linkage has its drawbacks in practice. Because the generating of the M VP hypotheses is random, and the characteristic functions are fully determined by these hypotheses, it is possible to have different grouping results from different runs. If the true supporting edges dominate all edges, the distribution of the M VP locations could be stable and so could the obtained VPs. On the contrary, if the ratio of true supporting edges is smaller such as images containing many noisy edges, the random process can lead to very instable VPs which are not desired in a real system.
The VP detection problem and the stability issue may be formulated by a Bayesian framework:
V*=argmaxVP(V|E)∝P(E|V)P(V), Eq. 6
where V* is our objective set of VPs.
The space of V, all possible combinations of VPs, is very large. Our process to obtain the set of edges (E) is deterministic, but J-Linkage is a non-deterministic process concerning a very small subspace of V. Therefore, the effectiveness of J-Linkage is highly depending on the subspace of V that it chooses, i.e., the value of P(V).
Since the guess of V is based on E, it is possible to put some constraints, C, on E such that the guesses of V can be more reliable than the samples based on the whole E. Assuming the edges are independent from each other, we can formulate the prior probability, P(V) as:
E′={e|eεE,e satisfies C}, Eq. 7
P(V)∝ΠeεE,p(e), Eq. 8
where p(e) is defined as the probability of edge e being a supporting edge of V.
A typical definition of C is to choose the edges according to their lengths. Long edges are indeed more reliable than short edges, however, they are not necessarily the true supporting edges. The VPD method 14 of the present invention incorporates lengths with the derived information from HoG in module 40, including the co-occurrence directions 46 and the changing blocks 48. The co-occurrence directions module 46 are calculated by the whole-image statistics, so the edges in these directions are less like noises. The changing blocks module 48 provides the possible locations of plane transitions; edges along the transition boundaries are more possible to relate to VPs while the edges inside planes may just be textures. In the method of the present invention, we define C as a fixed number of the selected edges, and the selection is based on a compound edge score calculated by edge lengths, edge directions, and changing blocks. For some edge k, we have:
It is assumed that p(e) in Eq. 8 is proportional to Se(k). The VP detection method 28 is thus constructed as shown in
Next at step 102, R times of J-Linkage is run to total K VP's. Using E′ to generate the hypotheses of VPs implies higher P(V) in Eq. 8, and the resultant VPs are chosen from these hypotheses. Each run may give different number of VPs and we denote the total number of VPs from the R runs as K, K<<R×N′.
Next at step 104, all the VPs from the R runs are clustered by the J-Linkage info. The R runs of J-Linkage will also help achieve the higher P(V). The K VPs from the R independent runs can have duplications or similar positions. To resolve these clustered VPs, J-Linkage may be applied again. The preference matrix for the K VPs (K×N′) is transposed as shown in
Finally, at step 106, VPs are selected from outside the image to inside the image. The K_c clusters are now ready, and the best VP is selected for each VP cluster. However, K_c is still larger than the number of true VPs, because the similar wrong grouping of edges can still happen in several runs. Since K and K_c are usually very small, we search the best VP for each VP cluster as its representative from the K VPs, and then choose the best representative among all clusters. To evaluate some VP v, Eq. 10 is used to calculate a VP score:
S(v)=log(VP cluster size(v))*(Σt=1TSe(t)), Eq. 10
where VP cluster size(v) gives the number of VPs in the same cluster where v resides.
The VP score relates to the probability of the cluster and is effective when we are comparing the representatives between clusters. Only the supporting edges are counted for v for T edges, T<N′. If the best representative is good enough (S(v)>Sth, or the threshold value for the VP score), it is chosen as our VP, and the whole cluster is removed, as well as the supporting edges. The rest of the clusters and edges are used for choosing other VPs, until no more VP can be chosen. Moreover, the VP clusters are separated into two groups: clusters outside the image and clusters inside the image. As VPs outside the image are corresponding to most vertical and horizontal edges, we choose the outside VPs first to remove such edges, and then choose the inside ones.
In the present implementation, we set R=4, M=80˜100, N<=120, and Sth=30 (diagonal=512). CGM will only use the VPs inside the image (inside VPs).
In certain embodiments, it is desirable to validate VP's for CGM. Determination of VP ROI, described below, is specifically configured for CGM. The four plane model, also described below, may also be helpful in removal of inappropriate scenes and is also important for the vanishing direction estimation (VDE) step 16. However, for pure VP detection, it is sufficient to stop at VP ROI.
VP ROI removes the inside VPs which may lack some parts for CGM to render.
If we focus on one inside VP, we can roughly describe the scene by a four-plane model: Back plane, Right plane, Left plane, Ground plane as shown in
The method 150 used to obtain the four lines separating the planes will be detailed below. At this time, we can use the inside VP and its supporting edges to calculate the directions with salient edges. These directions will be the candidates to construct the four lines separating the planes.
3. Vanishing Direction Estimation (VDE) and Road Width Estimation (RWE)
The vanishing direction (VD) is the central direction of the ground plane, and a viewer can keep walking on the ground plane along this direction in 3D toward the VP. The method 10 of the present invention estimates both the VD and the road width (RW) in the 2D image, so the boundaries between the vertical planes and the ground plane are needed. For VDE and RWE calculation, the inside VP and the VP validation computation results in the previous discussion are input for use.
VDE and RWE are coupled together because they both relate to the ground plane, and the two problems can be readily solved if the ground plane is already estimated. The previously detailed four-plane model, which contains the definition of the ground plane, can be used in classifying the four planes with the four supporting lines.
The next step 154 is to initialize the segments belonging to each plane. Only the segments with high confidence are assigned the plane labels, and the ambiguous segments are labeled as “undetermined,” which is classified in the next step.
The third step 156 comprises a semi-supervised classification of the unlabeled segments based on the graph-Laplacian transductive learning. The labeled segments from the previous stage and the similarities between segments are used to guide the computation.
At the final step 158, the boundary of the ground plane can be identified, since all segments are labeled. The VD is estimated first, and then an appropriate position on the line of the VD is selected to estimate the RW.
Image segmentation is used to group the neighboring pixels together based on a showing of similar colors or textures. As a result, the following plane initialization and classification will use ‘segment’ as the data unit instead of ‘pixel’.
Though the superpixel method 152 already has some adaptiveness, it is still hard to use a fixed set of parameters for all images. To avoid over-segmentation for simple images, the logarithm of the number of edges is used as a rough measure of image complexity. The parameters are automatically adjusted, so simpler images tend to have larger segments. The other issue is the speed of segmentation, which is proportional to the image size. We resize images to diagonal=256 pixels for faster segmentation with satisfactory results.
The next step is to know which object/structure/region each segment belongs to. The ultimate goal of the method of the present invention is to solve for the ground plane for VDE and RWE, so the problem can be solved in a simpler way. This leads to the four-plane classification method 170 detailed in
Before executing the main processes, additional measures from the segments 174 are used. For each segment, we calculate the following things which will be used in plane initialization and classification:
(a) Intensity histogram (I): 16 bins map to 0˜255 intensity values,
Both I and H are normalized. For some segment i, its intensity histogram I(i) is normalized by the number of pixels in the segment such that the sum of all bins=1. Its HoG H(i), however, was normalized by the L2-norm of the bin vector as described previously.
The similarities between one segment and its neighboring segments (module 178) are also computed, which can be used to construct a similarity matrix W at 186:
where Sims(i, j) is determined by I, H, θ, and r of the two segments, and Sim(i,j)≦1.0:
Sim(i,j)=wHoG·DS(H(i),H(j))+wHEnt·(Ent(H(i))−Ent(H(j)))+wInt·DS(I(i),I(j))+wTheta·(θ(i)−θ(j))+wVPDist·(r(i),−r(j)) Eq. 12
where DS (a, b) is the dissimilarity calculated by one minus the cosine coefficient of feature vectors a and b, and Ent(·) is the entropy function. wHoG, wHEnt, wInt, wTheta, and wVPDist are the corresponding weights of HoG dissimilarity, HoG entropy difference, intensity dissimilarity, angle difference, and VP distance difference. W is sparse, symmetric, and with 1s at the diagonal as shown in
Two assumptions were made on the input scene. First, the camera was at the upright angle when capturing the image, that is, the horizon in the image is exactly horizontal. Second, each of the 2D quadrants respective to the VP contains exactly one supporting line. (This assumption is especially important for CGM. Because the supporting lines in the 3rd and 4th quadrants are need for defining a good ground plane; misclassification of other three planes due to this assumption will not affect CGM results). As shown in
The initialization of plane segments step 176 has two primary steps: (1) assign four initial lines/planes, (2) putting segments into each plane,
To assign four initial lines/planes, one supporting direction for each supporting line from the angular histogram constructed by VPD is chosen.
Referring now to
According to the six lines shown in
The number of segments labeled could be very small because we set the supporting lines conservatively. This next process is to use two types of features: edge and region, to explore more segments to label. We will grow the L and R planes downward and expand the B and G planes horizontally. The extension from the labeled segments also has two steps: (1) grow from the labeled segments by vertical/horizontal edges and adjust the lines, and (2) grow from the labeled segments by regional properties.
For the growing based on edges, vertical edges are highly possible inside the L and R planes and separate the segments. (For purposes of the present application, the edges were defined with a 5 degree tolerance; that is, 85°<edge direction <95° or 265°<edge direction <275°. Similarly, the horizontal edges were defined as −5°<edge direction <5° or 175°<edge direction <185°.) Starting from the labeled L and R segments, we trace downward and check the overlap between the boundary of a U segment and the vertical edges. If the overlap is larger than a predefined VERT_TH, this U segment is changed to L or R.
However, intersections of vertical and horizontal lines can be the critical transition positions between planes, such as the lighter blocks in image 142 of
On the contrary, the growing of the B and G planes is based on the horizontal edges.
Without adding more labels, B is most appropriate for representing 146 and plane 150. We again check the boundaries of the L, R, and U segments with the horizontal edges. They are updated to B if their boundary overlap>HORI_TH. The maximum ground spanning lines need to be updated after the growing because we may have more L and R segments labeled.
Growing by edges checks the boundary of a segment, and growing by region properties checks the statistics of the whole segment. Giving two thresholds, Sim_TH for Sims and H_TH for H, the B, L, and R planes can be further expanded by including the neighboring U segments if either of the following criteria is satisfied:
Sim(i,j)>SimTH,segment iεB,L, or R, and segment jεU, vertical components of H(j)>H—TH, segment jεU Eq. 13
In
Because it is hard to generate all possible cases for offline training a supervised four-plane classifier, we use a semi-supervised method which can fully use the on-line data for classification. Such a method would need some labeled data for inferring the rest, unlabeled data. In our case, the data are the segments and the labels correspond to the different planes.
We let Np denote the number of labeled segments of some plane label P. They can give the rough estimations of the two probabilities:
where KNN(i,P,k)=j if jεP, sim(i,j) is the kth largest. (In a preferred implementation, K=3).
We roll back the labeled segments with small likelihood probabilities to U for more reliable results. Moreover, some p(i|P)'s are enforced to zero according to the angular location of segment i relative to VP: a) Right to VP, p(i|L)=0, b) Left to VP, p(i|R)=0, c) Above VP, p(i|G)=0, d) Below VP, p(i|B)=0.
The semi-supervised classification is realized by the graph—Laplacian transductive method. This method will run four times, which is equivalent to the total number of plane labels. Each run only targets at one plane label: the initialed segments belonging to this target label is considered ‘labeled’ while all other segments are considered ‘unlabeled’. The corresponding posterior probability, p(P|i) for each unlabeled segment i respective to the plane P, will be calculated at the end of each run.
For one run, we use the footnote I to represent the labeled data indexes and u to represent the unlabeled data indexes, and n is the total number of segments (data). All vectors/matrixes rearranged so the labeled data first and then the unlabeled ones. The objective is to minimize the following cost function C:
where A is the adjacent matrix by setting the diagonal of W in Eq. 1 zero, Y={y1, y2, . . . , yl, yl+1, . . . , yn}T is the vector of the posterior probabilities of data, i.e., the labeled data will have y=1.0. D is a diagonal matrix, Dii=Σj=1nAij and L=D−A is called the Laplacian matrix.
Since the costs between two labeled segments are fixed, we only minimize the terms associated with the unlabeled ones:
argminY
It's convex so the minimum can be obtained by the derivative respective to Yu:
2LulYl+2LuuYu=0,Yu=−Luu−1LulYl, Eq. 16
After four runs, we get p(B|i), p(L|i), p(R|i), and p(G|i). Segment i is assigned to the label which gives the largest p(P|i),
To obtain more stable and precise results than Eq. 16, Eq. 14 can be extended with a likelihood term:
where λ is a weighting scalar (0.3 in our implementation), Su is a vector containing the likelihood probabilities of the unlabeled segments belonging to this label (p(i|P)). We then have the solution:
Y
u=(λI+Luu)−1(λSu−LulY1), Eq. 18
We further introduce the “Robust Multi-Class Graph Transduction” method, to apply more constraints to the transductive process. The idea is to let the resultant combination of Yu and Yl also follow p(P)'s . We use Yu from Eq. 18 as the initial value of Yu,0. This method would adjusts Yu,0 by a function f which relates to Luu, Yu,0, Yl, and p(P):
Y
u
=Y
u,0
−f(Luu,Yu,0,Yl,p(P)), Eq. 19
We finally have all segments labeled, so the estimated area of each plane can be obtained accordingly. For CGM, the most important plane is the ground plane (G) which can determine both the vanishing direction and the road width. We search the two lower spanning lines in the 3rd and the 4th quadrants which define the ground plane area. The two spanning lines are called the “ground spanning lines” (GSLs) and can be estimated by the following steps:
(a) Mark the boundaries between G, L, R, and B (
(b) Calculate the angle between each boundary point and the VP. Generate an angular histogram by the point counts and get the peak directions (
(c) Estimate two spanning lines of G with the largest angular histogram responses, one between 180˜270 degrees and the other between 270˜360 degrees (
We call the angular histogram in
∀j(j=1,2),∃is.t.|Aedge(i)−Aground(j)|<AMatch—TH, Eq. 20
where Aedge (·) is a supporting direction from the edge angular histogram, and Aground(1) and Aground (2) are the two GSLs from the ground boundary angular histogram.
Aedge(·)'s are from the 3rd and 4th quadrants, in between the maximum ground spanning lines and the minimum ground spanning lines. AMatch_TH is the match tolerance in angle (AMatch_TH=15 in a preferred implementation).
The VD is defined as the bisector of the angle spanned by the two GSLs.
Since the VD line is the bisector line of GSLs, which defines the two sides of the estimated ground plane, the corresponding points on the two sides of the road, such as points a and b in
The first task is to choose the point along the VD line to calculate the road width. This depends on the criteria from applications, and in CGM we want to have full vertical materials to render both the left and the right hand sides. The full material criterion implies that we are constrained by the shorter side of materials, and the shorter side corresponds to the shorter length from the VP to the intersection between the GSL and the image boundary. For example, in the illustration 160 of
IV. Evaluation Methods
This section details the tools module 30 illustrated in
The most intuitive measure of the estimated GSLs is the angle bias of each line. The true position of VP (V) is the origin of the two GSLs (L1, L2), and we also calculate the angles of L1 and L2 and respective to V, θ1 and θ2. The angle bias εθ is defined as the sum of the absolute differences of the two line angles:
εθ=|θ1−|+|θ2−
|, Eq. 21
If the estimated VDs were based on true VPs, i.e., {circumflex over (V)}=V , evaluations by angle bias are appropriate; however, if the VPs were based on previously estimated VPs, this measure cannot fully reflect the performance because the bias can be caused by both the estimated VP and VD. The VDs which are the bisectors of GSLs would also be in the same direction, but VP they point to are totally different.
If both the VP and the VD were estimated, the estimated VP can introduce a shift in location so it would be better to have a measure based on locations. We consider all image points on the estimated GSLs, and calculate the lengths of the true GSLs as |L1| and |L2|. For each point on the estimated GSLs, the distance between it and its corresponding true GSL can be calculated. The line position bias εl is defined as the sum of the point-line distances normalized by |L1|+|L2|:
The third VD measure is based on the area spanned by the two GSLs. The two areas spanned by the true GSLs and the estimated GSLs are denoted as A and Â, respectively. The area bias εA is defined as the number of the different pixels between A and  and normalized by the area of A denoted as |A|:
Here we provide the evaluation results of the current VPD+VDE+RWE algorithms based on εA. The parameters of our algorithms were kept as the default settings. We chose 42 images with clear ground boundaries and mark their GSLs as the ground truth. Since both VPD and VDE contain non-deterministic processes, we tested 5 runs of the 42 images so totally 210 instances.
Embodiments of the present invention may be described with reference to flowchart illustrations of methods and systems according to embodiments of the invention, and/or algorithms, formulae, or other computational depictions, which may also be implemented as computer program products. In this regard, each block or step of a flowchart, and combinations of blocks (and/or steps) in a flowchart, algorithm, formula, or computational depiction can be implemented by various means, such as hardware, firmware, and/or software including one or more computer program instructions embodied in computer-readable program code logic. As will be appreciated, any such computer program instructions may be loaded onto a computer, including without limitation a general purpose computer or special purpose computer, or other programmable processing apparatus to produce a machine, such that the computer program instructions which execute on the computer or other programmable processing apparatus create means for implementing the functions specified in the block(s) of the flowchart(s).
Accordingly, blocks of the flowcharts, algorithms, formulae, or computational depictions support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and computer program instructions, such as embodied in computer-readable program code logic means, for performing the specified functions. It will also be understood that each block of the flowchart illustrations, algorithms, formulae, or computational depictions and combinations thereof described herein, can be implemented by special purpose hardware-based computer systems which perform the specified functions or steps, or combinations of special purpose hardware and computer-readable program code logic means.
Furthermore, these computer program instructions, such as embodied in computer-readable program code logic, may also be stored in a computer-readable memory that can direct a computer or other programmable processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the block(s) of the flowchart(s). The computer program instructions may also be loaded onto a computer or other programmable processing apparatus to cause a series of operational steps to be performed on the computer or other programmable processing apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable processing apparatus provide steps for implementing the functions specified in the block(s) of the flowchart(s), algorithm(s), formula (e), or computational depiction(s).
From the discussion above it will be appreciated that the invention can be embodied in various ways, including the following:
1. A method for identifying one or more image characteristics of an image, comprising: (a) inputting an image; (b) identifying edge information with respect to said image; and (c) identifying a vanishing point within said image based on said edge information.
2. A method as recited in claim 1, wherein identifying edge information comprises: (a) calculating a histogram of gradients (HoG) associated with the image; (b) determining a strength of edge information with respect to the image as a function of the calculated HoG; and (c) selecting the image for processing if the strength of the edge information meets a minimum threshold value.
3. A method as recited in claim 2, wherein the strength of the edge information is a function of an entropy value of the HoG and a ratio of short edges to total of number of edges within the image.
4. A method as recited in claim 2, wherein calculating the HoG comprises: (a) at each location of the image, calculating gradient values in vertical and horizontal directions of the image; (b) obtaining a magnitude of gradient values corresponding to each of said locations; (c) dividing the image into a plurality of blocks; and (d) calculating a histogram for each block within the plurality of blocks; (e) wherein the histogram comprises a plurality of bins each representing an orientation and accumulation of magnitudes of locations within an orientation.
5. A method as recited in claim 2, wherein identifying a vanishing point comprises: (a) determining a set of edges that occur a plurality of times in the direction of the vanishing point; and (b) applying a score to the set of edges.
6. A method as recited in claim 5, wherein the edges are scored according to one or more of the following properties: edge length, the probability that the edge belongs to a plane boundary within the image, and the probability that the edge supports a vanishing point with other edges.
7. A method as recited in claim 5, wherein each edge score is computed as a function of the calculated histogram of oriented gradients (HoG).
8. A method as recited in claim 7, wherein the vanishing point is validated for use in a 3D computer graphical model.
9. A method as recited in claim 2, further comprising classifying a plurality of planes associated with the image based on the identified vanishing point.
10. A method as recited in claim 2, wherein classifying a plurality of planes comprises: (a) segmenting the images such that neighboring pixels with similar colors or textures within the image are combined as a segment; (b) assigning segments with high confidence with one or more plane labels; (c) classifying unlabeled segments based on transductive learning; and (d) identifying a ground plane as a function of the labeled segments.
11. A method as recited in claim 2, wherein supporting edges of detected vanishing points are used to obtain candidates for a plane boundary associated with the ground plane.
12. A method as recited in claim 11, further comprising: (a) identifying a vanishing direction associated with the image based on the identified ground plane; (b) wherein the vanishing direction comprises a bisector of two boundaries associated with the ground plane.
13. A method as recited in claim 11, further comprising calculating a road width associated with the image at a location along the identified vanishing direction.
14. A system for identifying one or more image characteristics of an image, comprising: (a) a processor; and programming executable on the processor and configured for: (i) inputting an image; (ii) identifying edge information with respect to said image; and (iii) identifying a vanishing point within said image based on said edge information.
15. A system as recited in claim 1, wherein identifying edge information comprises: (a) calculating a histogram of gradients (HoG) associated with the image; (b) determining a strength of edge information with respect to the image as a function of the calculated HoG; and (c) selecting the image for processing if the strength of the edge information meets a minimum threshold value.
16. A system as recited in claim 15, wherein the strength of the edge information is a function of an entropy value of the HoG and a ratio of short edges to total of number of edges within the image.
17. A system as recited in claim 15, wherein calculating the HoG comprises: (a) at each location of the image, calculating gradient values in vertical and horizontal directions of the image; (b) obtaining a magnitude of gradient values corresponding to each of said locations; (c) dividing the image into a plurality of blocks; and (d) calculating a histogram for each block within the plurality of blocks; (e) wherein the histogram comprises a plurality of bins each representing an orientation and accumulation of magnitudes of locations within an orientation.
18. A system as recited in claim 15, wherein identifying a vanishing point comprises: (a) determining a set of edges that occur a plurality of times in the direction of the vanishing point; and (b) applying a score to the set of edges.
19. A system as recited in claim 18, wherein the edges are scored according to one or more of the following properties: edge length, the probability that the edge belongs to a plane boundary within the image, and the probability that the edge supports a vanishing point with other edges.
20. A system as recited in claim 18, wherein each edge score is computed as a function of the calculated histogram of oriented gradients (HoG).
21. A system as recited in claim 20, wherein the vanishing point is validated for use in a 3D computer graphical model.
22. A system as recited in claim 15, wherein said programming is further configured for classifying a plurality of planes associated with the image based on the identified vanishing point.
23. A system as recited in claim 15, wherein classifying a plurality of planes comprises: (a) segmenting the images such that neighboring pixels with similar colors or textures within the image are combined as a segment; (b) assigning segments with high confidence with one or more plane labels; (c) classifying unlabeled segments based on transductive learning; and (d) identifying a ground plane as a function of the labeled segments.
24. A system as recited in claim 15, wherein supporting edges of detected vanishing points are used to obtain candidates for a plane boundary associated with the ground plane.
25. A system as recited in claim 24, wherein said programming is further configured for: (a) identifying a vanishing direction associated with the image based on the identified ground plane; (b) wherein the vanishing direction comprises a bisector of two boundaries associated with the ground plane.
26. A system as recited in claim 24, wherein said programming is further configured for calculating a road width associated with the image at a location along the identified vanishing direction.
27. A system for identifying one or more image characteristics of an image, comprising: (a) a processor; and (b) programming executable on the processor and configured for: (i) inputting an image; (ii) identifying edge information with respect to said image; and (iii) identifying a vanishing point within said image based on said edge information; (iv) wherein identifying edge information comprises: calculating a histogram of gradients (HoG) associated with the image; determining a strength of edge information with respect to the image as a function of the calculated HoG; and selecting the image for processing if the strength of the edge information meets a minimum threshold value.
28. A system as recited in claim 27, wherein identifying a vanishing point comprises: (a) determining a set of edges that occur a plurality of times in the direction of the vanishing point; and (b) applying a score to the set of edges.
Although the description above contains many details, these should not be construed as limiting the scope of the invention but as merely providing illustrations of some of the presently preferred embodiments of this invention. Therefore, it will be appreciated that the scope of the present invention fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of the present invention is accordingly to be limited by nothing other than the appended claims, in which reference to an element in the singular is not intended to mean “one and only one” unless explicitly so stated, but rather “one or more.” All structural, chemical, and functional equivalents to the elements of the above-described preferred embodiment that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present invention, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. No claim element herein is to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the element is expressly recited using the phrase “means for.”