a illustrates a right-side view of a stereo-vision object detection system incorporated in a vehicle, viewing a relatively near-range object;
b illustrates a front view of the stereo cameras of the stereo-vision object detection system incorporated in a vehicle, corresponding to
c illustrates a top view of the stereo-vision object detection system incorporated in a vehicle, corresponding to
a illustrates a geometry of a stereo-vision system;
b illustrates an imaging-forming geometry of a pinhole camera;
a and 15b respectively illustrate an integer-filtered-folded valid-count vector and a corresponding vector of differential values for a situation of a near-range object within an intermediate portion of the field-of-view of the stereo-vision system;
c and 15d respectively illustrate an integer-filtered-folded valid-count vector and a corresponding vector of differential values for a situation of near-range objects within left-most and intermediate portions of the field-of-view of the stereo-vision system;
e and 15f respectively illustrate an integer-filtered-folded valid-count vector and a corresponding vector of differential values for a situation of a near-range object within a right-most portion of the field-of-view of the stereo-vision system;
Referring to
The stereo-vision object detection system 10 incorporates a stereo-vision system 16 operatively coupled to a processor 18 incorporating or operatively coupled to a memory 20, and powered by a source of power 22, e.g. a vehicle battery 22.1. Responsive to information from the visual scene 24 within the field of view of the stereo-vision system 16, the processor 18 generates one or more signals 26 to one or more associated driver warning devices 28, VRU warning devices 30, or VRU protective devices 32 so as to provide, by one or more of the following ways, for protecting one or more VRUs 14 from a possible collision with the vehicle 12: 1) by alerting the driver 33 with an audible or visual warning signal from a audible warning device 28.1 or a visual display or lamp 28.2 with sufficient lead time so that the driver 33 can take evasive action to avoid the collision; 2) by alerting the VRU 14 with an audible or visual warning signal—e.g. by sounding a vehicle horn 30.1 or flashing the headlights 30.2—so that the VRU 14 can stop or take evasive action; 3) by generating a signal 26.1 to a brake control system 34 so as to provide for automatically braking the vehicle 12 if a collision with a VRU 14 becomes likely, or 4) by deploying one or more VRU protective devices 32—for example, an external air bag 32.1 or a hood actuator 32.2 in advance of a collision if a collision becomes inevitable. For example, in one embodiment, the hood actuator 32.2—for example, either a pyrotechnic, hydraulic or electric actuator—cooperates with a relatively compliant hood 36 so as to provide for increasing the distance over which energy from an impacting VRU 14 may be absorbed by the hood 36.
Referring also to
r=b·f/d, where d=dl−dr (1)
Referring to
Referring to
Referring to
Referring to
In accordance with one embodiment, an associated area correlation algorithm of the stereo-vision processor 78 provides for matching corresponding areas of the first 40.1 and second 40.2 stereo intensity-image components so as to provide for determining the disparity d therebetween and the corresponding range r thereof. The extent of the associated search for a matching area can be reduced by rectifying the input intensity images (I) so that the associated epipolar lines lie along associated scan lines of the associated first 38.1 and second 38.2 stereo-vision cameras. This can be done by calibrating the first 38.1 and second 38.2 stereo-vision cameras and warping the associated input intensity images (I) to remove lens distortions and alignment offsets between the first 38.1 and second 38.2 stereo-vision cameras. Given the rectified images (C), the search for a match can be limited to a particular maximum number of offsets (D) along the baseline direction, wherein the maximum number is given by the minimum and maximum ranges r of interest. For implementations with multiple processors or distributed computation, algorithm operations can be performed in a pipelined fashion to increase throughput. The largest computational cost is in the correlation and minimum-finding operations, which are proportional to the number of pixels 100 times the number of disparities. The algorithm can use a sliding sums method to take advantage of redundancy in computing area sums, so that the window size used for area correlation does not substantially affect the associated computational cost. The resultant disparity map (M) can be further reduced in complexity by removing such extraneous objects such as road surface returns using a road surface filter (F).
The associated range resolution (Δr) is a function of the range r in accordance with the following equation:
The range resolution (Δr) is the smallest change in range r that is discernible for a given stereo geometry, corresponding to a change Δd in disparity (i.e. disparity resolution Δd). The range resolution (Δr) increases with the square of the range r, and is inversely related to the baseline b and focal length f, so that range resolution (Δr) is improved (decreased) with increasing baseline b and focal length f distances, and with decreasing pixel sizes which provide for improved (decreased) disparity resolution Δd.
Alternatively, a CENSUS algorithm may be used to determine the range-map image 80 from the associated first 40.1 and second 40.2 stereo intensity-image components, for example, by comparing rank-ordered difference matrices for corresponding pixels 100 separated by a given disparity d, wherein each difference matrix is calculated for each given pixel 100 of each of the first 40.1 and second 40.2 stereo intensity-image components, and each element of each difference matrix is responsive to a difference between the value of the given pixel 100 and a corresponding value of a corresponding surrounding pixel 100.
More particularly, the first stereo-vision camera 38.1 generates a first intensity-image component 40.1 of each real-world point P from a first viewpoint 42.1, and the second stereo-vision camera 38.2 generates a second intensity-image component 40.2 of each real-world point P from a second viewpoint 42.2, wherein the first 42.1 and second 42.2 viewpoints of view are separated by the above-described baseline b distance. Each of the first 40.1 and second 40.2 intensity-image components have the same total number of pixels 100 organized into the same number of rows 96 and columns 98, so that there is a one-to-one correspondence between pixels 100 in the first intensity-image component 40.1 and pixels 100 of like row 96 and column 98 locations in the corresponding second intensity-image component 40.2, and a similar one-to-one correspondence between pixels 100 in either the first 40.1 or second 40.2 intensity-image components and pixels 100 of like row 94 and column 102 locations in the corresponding range-map image 80, wherein the each pixel value of the first 40.1 or second 40.2 intensity-image components correspond to an intensity value at the given row 96 and column 98 location, whereas the pixel values of the corresponding range-map image 80 represent corresponding down-range coordinate r of that same row 94 and column 102 location.
For a given real-world point P, the relative locations of corresponding first 52.1 and second 52.2 image points thereof in the first 40.1 and second 40.2 intensity-image components are displaced from one another in their respective first 40.1 and second 40.2 intensity-image components by an amount—referred to as disparity—that is inversely proportional to the down-range coordinate r of the real-world point P. For each first image point 52.1 in the first intensity-image component 40.1, the stereo vision processor 78 locates—if possible—the corresponding second intensity-image point 52.2 in the second intensity-image component 40.2 and determines the down-range coordinate r of the corresponding associated real-world point P from the disparity between the first 52.1 and second 52.2 image points. This process is simplified by aligning the first 38.1 and second 38.2 stereo-vision cameras so that for each first image point 52.1 along a given row coordinate 96, JROW in the first intensity-image component 40.1, the corresponding associated epipolar curve in the second intensity-image component 40.2 is a line along the same row coordinate 96, JROW in the second intensity-image component 40.2, and for each second image point 52.2 along a given row coordinate 96, JROW in the second intensity-image component 40.2, the corresponding associated epipolar curve in the first intensity-image component 40.1 is a line along the same row coordinate 96, JROW in the first intensity-image component 40.1, so that corresponding first 52.1 and second 52.2 image points associated with a given real-world point P each have the same row coordinate 96, JROW so that the corresponding first 52.1 and second 52.2 image points can be found from a one-dimensional search along a given row coordinate 96, JROW. An epipolar curve in the second intensity-image component 40.2 is the image of a virtual ray extending between the first image point 52.1 and the corresponding associated real-world point P, for example, as described further by K. Konolige in “Small Vision Systems: Hardware and Implementation,” Proc. Eighth Int'l Symp. Robotics Research, pp. 203-212, October 1997, (hereinafter “KONOLIGE”), which is incorporated by reference herein. The epipolar curve for a pinhole camera will be a straight line. The first 38.1 and second 38.2 stereo-vision cameras are oriented so that the focal planes 48.1, 48.2 of the associated lenses 44.1, 44.2 are substantially coplanar, and may require calibration as described by KONOLIGE or in Application '059, for example, so as to remove associated lens distortions and alignment offsets, so as to provide for horizontal epipolar lines that are aligned with the row coordinates 96, JROW of the first 38.1 and second 38.2 stereo-vision cameras.
Accordingly, with the epipolar lines aligned with common horizontal scan lines, i.e. common row coordinates 96, JROW, of the first 38.1 and second 38.2 stereo-vision cameras, the associated disparities d or corresponding first 52.1 and second 52.2 image points corresponding to a given associated real-world point P will be exclusively in the X, i.e. horizontal, direction, so that the process of determining the down-range coordinate r of each real-world point P implemented by the stereo vision processor 78 then comprises using a known algorithm—for example, either what is known as the CENSUS algorithm, or an area correlation algorithm—to find a correspondence between first 52.1 and second 52.2 image points, each having the same row coordinates 96, JROW but different column coordinate 98, ICOL in their respective first 40.1 and second 40.2 intensity-image components, the associated disparity d either given by or responsive to the difference in corresponding column coordinates 98, ICOL. As one example, the CENSUS algorithm is described by R. Zabih and J. Woodfill in “Non-parametric Local Transforms for Computing Visual Correspondence,” Proceedings of the Third European Conference on Computer Vision, Stockholm, May 1994; by J Woodfill and B, Von Herzen in “Real-time stereo vision on the PARTS reconfigurable computer,” in Proceedings The 5th Annual IEEE Symposium on Field Programmable Custom Computing Machines, (April, 1997); by J. H. Kim, C. O. Park and J. D. Cho in “Hardware implementation for Real-time Census 3D disparity map Using dynamic search range,” from Sungkyunkwan University School of Information and Communication, Suwon, Korea; and by Y. K Baik, J. H. Jo and K. M. Lee in “Fast Census Transform-based Stereo Algorithm using SSE2,” in The 12th Korea-Japan Joint Workshop on Frontiers of Computer Vision, 2-3, February, 2006, Tokushima, Japan, pp. 305-309, all of which are incorporated herein by reference. As another example, the area correlation algorithm is described by KONOLIGE, also incorporated herein by reference. As yet another example, the disparity associated with each pixel 104 in the range-map image 80 may be found by minimizing either a Normalized Cross-Correlation (NCC) objective function, a Sum of Squared Differences (SSD) objective function, or a Sum of Absolute Differences (SAD) objective function, each objective function being with respect to disparity d, for example, as described in the following internet document: http:[slash][slash]3dstereophoto.blogspot.com[slash]2012[slash]01[slash]stereo-matching-local-methods.html, which is incorporated herein by reference, wherein along a given row coordinate 96, JROW of the first 40.1 and second 40.2 intensity-image components, for each column coordinate 98, ICOL in the first intensity-image component 40.1, the NCC, SSD or SAD objective functions are calculated for a first subset of pixels I1(u,v) centered about the pixel I1(ICOL, JROW), and a second subset of pixels I2(u,v) centered about the pixel I1(ICOL+DX, JROW), as follows:
the resulting disparity d is the value that minimizes the associated objective function (NCC, SSD or SAD). For example, in one embodiment, p=q=2.
Regardless of the method employed, the stereo vision processor 78 generates the range-map image 80 from the first 40.1 and second 40.2 intensity-image components, each comprising an NROW×NCOL array of image intensity values, wherein the range-map image 80 comprises an NROW×NCOL array of corresponding down-range coordinate r values, i.e.:
wherein each column 94, ICOL and row 102, JROW coordinate in the range-map image 80 is referenced to, i.e. corresponds to, a corresponding column 96, ICOL and row 98, JROW coordinate of one of the first 40.1 and second 40.2 intensity-image components, for example, of the first intensity-image component 40.1, and CZ is calibration parameter determined during an associated calibration process.
Referring to
Accordingly, the near-range detection and tracking performance based solely on the range-map image 80 from the stereo-vision processor 78 can suffer if the scene illumination is sub-optimal or when object 50 lacks unique structure or texture, because the associated stereo matching range fill and distribution are below acceptable limits to ensure a relatively accurate object boundary reconstruction. For example, the range-map image 80 can be generally used for detection and tracking operations if the on-target range fill (OTRF) is greater than about 50 percent.
It has been observed that under some circumstances, the on-target range fill (OTRF) can fall below 50 percent with relatively benign scene illumination and seemly relatively good object texture. For example, referring to
Referring to
Referring to
More particularly, in step (1002), a range-map image 80 is first generated by the stereo-vision processor 78 responsive to the first 40.1 or second 40.2 stereo intensity-image components, in accordance with the methodology described hereinabove. For example, in one embodiment, the stereo-vision processor 78 is implemented with a Field Programmable Gate Array (FPGA).
Referring to
Referring to
Then, referring also to
In step (1010), the folded valid-count vector 116′, H( ) is filtered with a smoothing filter, for example, in one embodiment, a central moving average filter, wherein, for example, in one embodiment, the corresponding moving average window comprises 23 elements, so that every successive group of 23 elements of the folded valid-count vector 116′, H( ) are averaged to form a resulting corresponding filtered value, which, in step (1012), is then replaced with a corresponding integer approximation thereof, so as to generate a resulting integer-filtered-folded valid-count vector 118′
In step (1014), the integer-filtered-folded valid-count vector 118,
In step (1016), the vector of differential values 120′, {dot over (H)}( ) is used to locate void regions 122 in the column space of the range-map image 80 and the first 40.1 and second 40.2 stereo intensity-image components. Generally, a particular void region 122 will be either preceded or followed—or both—in column space by a region 124 associated with valid range values 106. The differential value 120, {dot over (H)}(j) at a left-most boundary of a void region 122 adjacent to a preceding region associated with valid range values 106 will be negative, and the differential value 120, {dot over (H)}(j) at a right-most boundary of a void region 122 adjacent to a following region 124 associated with valid range values 106 will be positive. Accordingly, these differential values 120, {dot over (H)}(j) may be used to locate the associated left 126.1 and right 126.2 column boundaries of a particular void region 122. For example, referring to
Conceivably, one of the left 126.1 or right 126.2 column boundaries of a particular void region 122 could be at a boundary of the range-map image 80, i.e. at either column 0 or column N−1. For example, referring to
is equal to zero.
Referring to
More particularly, in step (1602), for each void region 122, and beginning with the first void region 122.1 having the lowest row 94 of range pixels 104 that contains void values 108—prospectively corresponding to the nearest near-range object 50′,—then in step (1604), the corresponding intensity pixels 100 of one of the first 40.1 or second 40.2 stereo intensity-image components are identified within the corresponding left 126.1 and right 126.2 column boundaries of the void region 122, for example, as illustrated in
Referring to
More particularly, in step (1610), the largest mode 138, 138.1—for example, the mode 138 having either the largest amplitude or the largest total number of associated intensity pixels 100—is first identified. Then, in step (1612), if the total count of intensity pixels 100 within the identified mode 138, 138.1 is less than a threshold, then, in step (1614), the next largest mode 138, 138.2 is identified and step (1612) is repeated, but for the total count of all identified modes 138, 138.1, 138.2. For example, in one embodiment, the threshold used in step (1612) is 60 percent of the total number of intensity pixels 100 within the vertically-bounded void region 130.
For example, referring to
If, in step (1612) the total count of intensity pixels 100 within the identified mode 138, 138.1 is greater than or equal to the threshold, then, in step (1616), the resulting intensity-image 90 of the prospective near-range object 50′ is classified by the object discrimination system 92, for example, in accordance with the teachings of U.S. patent application Ser. No. 11/658,758 filed on 29 Sep. 2008, entitled Vulnerable Road User Protection System, or U.S. patent application Ser. No. 13/286,656 filed on 16 Nov. 2011, entitled Method of Identifying an Object in a Visual Scene, which are incorporated herein by reference. For example, the prospective near-range object 50′ may be classified using any or all of the metrics of an associated feature vector described therein, i.e.
Accordingly, the stereo-vision object detection system 10 together with the associated first 1000.1 and second 1000.2 portions of the associated stereo-vision object detection process 1000 provide for detecting relatively near-range objects 50′ that might not otherwise be detectable from the associated range-map image 80 alone. Notwithstanding that the stereo-vision object detection system 10 has been illustrated in the environment of a vehicle 12 for detecting an associated vulnerable road user 14, it should be understood that the stereo-vision object detection system 10 is generally not limited to this, or any one particular application, but instead could be used in cooperation with any stereo-vision system 16 to facilitate the detection of objects 50, 50′ that might not be resolvable in the associated resulting range-map image 80, but for which there is sufficient intensity variation in the associated first 40.1 or second 40.2 stereo intensity-image components to be resolvable using an associated image-intensity histogram 132.
In accordance with another aspect, in situations where the region 109 of void values 108 is substantially limited to the near-range object 50′, the near-range object 50′ can be detected directly from the range-map image 80, for example, by analyzing the region 109 of void values 108 directly, for example, in accordance with the teachings of U.S. patent application Ser. Nos. 11/658,758 and 13/286,656, which are incorporated herein by reference, for example, by extracting an analyzing a harmonic profile of the associated silhouette 109′ of the region 109. For example, a region surrounding the region 109 of void values 108 may be first transformed to a binary segmentation image, which is then analyzed in accordance with the teachings of U.S. patent application Ser. Nos. 11/658,758 and 13/286,656 so as to provide for detecting and/or classifying the associated near-range object 50′.
Furthermore, notwithstanding that the stereo-vision processor 78, image processor 86, object detection system 88 and object discrimination system 92 have been illustrated as separate processing blocks, it should be understood that any two or more of these blocks may be implemented with a common processor, and that the particular type of processor is not limiting.
Yet further, it should be understood that the stereo-vision object detection system 10 is not limited in respect of the process by which the range-map image 80 is generated from the associated first 40.1 and second 40.2 stereo intensity-image components.
While specific embodiments have been described in detail in the foregoing detailed description and illustrated in the accompanying drawings, those with ordinary skill in the art will appreciate that various modifications and alternatives to those details could be developed in light of the overall teachings of the disclosure. It should be understood, that any reference herein to the term “or” is intended to mean an “inclusive or” or what is also known as a “logical OR”, wherein when used as a logic statement, the expression “A or B” is true if either A or B is true, or if both A and B are true, and when used as a list of elements, the expression “A, B or C” is intended to include all combinations of the elements recited in the expression, for example, any of the elements selected from the group consisting of A, B, C, (A, B), (A, C), (B, C), and (A, B, C); and so on if additional elements are listed. Furthermore, it should also be understood that the indefinite articles “a” or “an”, and the corresponding associated definite articles “the’ or “said”, are each intended to mean one or more unless otherwise stated, implied, or physically impossible. Yet further, it should be understood that the expressions “at least one of A and B, etc.”, “at least one of A or B, etc.”, “selected from A and B, etc.” and “selected from A or B, etc.” are each intended to mean either any recited element individually or any combination of two or more elements, for example, any of the elements from the group consisting of “A”, “B”, and “A AND B together”, etc. Yet further, it should be understood that the expressions “one of A and B, etc.” and “one of A or B, etc.” are each intended to mean any of the recited elements individually alone, for example, either A alone or B alone, etc., but not A AND B together. Furthermore, it should also be understood that unless indicated otherwise or unless physically impossible, that the above-described embodiments and aspects can be used in combination with one another and are not mutually exclusive. Accordingly, the particular arrangements disclosed are meant to be illustrative only and not limiting as to the scope of the invention, which is to be given the full breadth of the appended claims, and any and all equivalents thereof.
The instant application claims benefit of U.S. Provisional Application Ser. No. 61/584,354 filed on Jan. 9, 2012, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61584354 | Jan 2012 | US |