This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2002-62439, filed on Mar. 7, 2002, the entire contents of which are incorporated herein by reference.
This application is related to the co-pending patent application Ser. No. 10/120,374 entitled “METHOD AND APPARATUS FOR PROCESSING PICTURES OF MOBILE OBJECT” filed on Apr. 12, 2002 by Shunske Kamijo, Masao Sakauchi, and Katsushi Ikeuchi.
1. Field of the Invention
The present invention relates to a method and apparatus for tracking moving objects (i.e., movable things such as cars, bicycles, and animals) in pictures by processing time-series pictures.
2. Description of the Related Art
Early detection of a traffic accident can not only enhance a success rate in life saving by speedy rescue operation, but also alleviate accident-related traffic congestion by speedup of the police inspection at the site. Therefore, various kinds of automation in recognition of traffic accident are expected. In order to achieve a high recognition rate of traffic accidents, it is necessary to correctly track moving objects by processing pictures captured by a camera.
In tracking moving objects, it is particularly difficult to differentiate overlapped moving objects in a picture from each other and to track moving objects without being affected by variations in luminance of the moving objects due to such ones as the shade of clouds or buildings.
The former difficulty can be overcome by a method of calculating a spatio-temporal correlation based on a spatio-temporal Markov model, which has been disclosed in the publication of JP 2001-148019-A whose inventors are the same as those of the present application. The latter difficulty can be overcome by using a differential picture (G. L. Gimel'farb, “Texture Modeling by Multiple Pairwise Pixel Interactions”, IEEE trans. PAMI, Vol. 18, No. 11, 1996, pp 1110–1114).
However, there are still technical challenges in achieving a high success rate of tracking moving objects with a suitable method of converting the differential picture for incorporation into the correlation-based method.
Accordingly, it is an object of the present invention to provide a method and apparatus for tracking moving objects whereby a high success rate of tracking moving objects can be achieved irrespective of overlapping between the moving objects in pictures and variations in luminance of the moving objects in the pictures.
In one aspect of the present invention, there is provided a moving object tracking method of processing first time-series pictures to track moving objects therein, the method comprising the steps of:
(a) generating second time-series pictures by converting the first time-series pictures to respective spatial differential frame pictures; and
(b) discriminating moving objects at a time t2 based on a correlation between one of the spatial differential frame pictures at a time t1 adjacent to the time t2 and another of the spatial differential frame pictures at the time t2, and also based on a discrimination result of moving objects included in said one of the spatial differential frame pictures at the time t1.
According to this configuration, owing to the step (a), the second time-series pictures is little affected by variations in luminance, and since the process of step (b) is performed for the second time-series pictures, it is possible to track moving objects with a high success rate even if the luminance of the moving objects varies in pictures or moving objects are overlapped with each other in pictures.
In the step (a), for example, a value of a pixel of interest in any of the spatial differential frame pictures is determined by converting a value of a corresponding pixel of interest of corresponding one of the first time-series pictures to a value I associated with a value H proportional to a value obtained by dividing a differential value between the value of the corresponding pixel of interest and values of its neighboring pixels by a maximum value among the corresponding pixel of interest and the neighboring pixels. Thereby, even if the pixel value is low and thus the differential value is small, it is possible to obtain edge information which is almost equivalent to that obtained in a case where the pixel value is high and the differential value is large, consequently increasing the success rate of tracking moving objects.
Further in the step (a), for example, if the value H is within a predetermined range, then the value I is determined by an approximate linear transformation of the value H, or else, the value I is determined by converting the value H to a value which is suppressed so as not to exceed an assumable maximum value Gmax of a pixel of the spatial differential frame picture. This allows the bit size of the value I to be reduced with little reduction of edge information included in the distribution of H, achieving a speedy image processing.
Other aspects, objects, and the advantages of the present invention will become apparent from the following detailed description taken in connection with the accompanying drawings.
Referring now to the drawings, wherein like reference characters designate like or corresponding parts throughout several views, a preferred embodiment of the present invention will be described below.
This apparatus is provided with an electronic camera 10 capturing the intersection and outputting the captured picture signal and a moving object tracking apparatus 20 processing pictures to automatically track moving objects.
Time series pictures shot by the electronic camera 10 are stored into an image memory 21 at a rate of, for example, 12 frames/sec.
The image memory 21 has a storage capacity of 3 frames or more, and the oldest frame picture is replaced with a new frame picture.
An image conversion section 22 copies each of frame pictures in the image memory 21 into a buffer memory 23, and with making reference to the copied frame picture, converts the corresponding frame picture in the image memory 21 to a spatial differential frame picture. This conversion is performed in two steps as follows:
Letting G(i, j) be a pixel value (brightness value) in the i-th row and j-th column of the original frame picture, a firstly converted pixel value H(i, j) in the i-th row and j-th column is expressed by the following equation.
H(i,j)=Σneighberpixels|G(i+di,j+dj)−G(i,j)| (1)
Here, Σneighberpixels denotes a sum over di=−c to c and dj=−c to c wherein c is a natural number. For example, when c=1, it denotes a sum over values of 8 pixels neighboring on the pixel of the i-th row and j-th column. As the luminance varies, the pixel value G(i, j) and its neighboring pixel values G(i+di, j+dj) vary in the same way. Therefore, the picture of H(i, j) is not affected by variations in the luminance.
Generally, the higher the pixel value is, the higher the absolute value of the difference between the pixel value and a neighboring pixel value is. Even if the pixel value is low and thus the difference is small, it is desirable to obtain edge information almost equivalent to edge information obtained when the pixel value is high and the difference is large, in order to increase the success rate of tracking moving objects. Thus, H(i, j) is normalized as follows:
H(i,j)=Σneighberpixels|G(i+di,j+dj)−G(i,j)|/(Gi,j,max/Gmax) (2)
where, Gi,j,max denotes the maximum value among original pixel values used in the calculation of H(i, j). For example, when c=1, Gi,j,max is the maximum pixel value of 3×3 pixels having its center at the i-th row and j-th column. Gmax denotes the assumable maximum value of the pixel value G(i, j). For example, when the pixel value is expressed in 8 bits, Gmax is 255. The following description will be given with reference to the case where c=1 and Gmax=255.
The assumable maximum value of H(i, j) varies according to the moving objects. For example, when G(i, j)=Gmax and all the 8 neighboring pixels around the pixel of i-th row and j-th column have a pixel value of 0, H(i, j)=8 Gmax and thus H(i, j) cannot be expressed in 8 bits.
On the other hand, a histogram made for the value of H(i, j) on the edge portion of moving objects shows that most values of H in the edge portion are in the range of 50 to 110. That is, as the value of H is higher than about 110, edge information quantity for the tracking of moving object is smaller, and thus it becomes less important.
Accordingly, it is desirable to suppress parts having a high value of H so as to reduce the bit length of the converted pixel and thereby attain a high image processing speed. Thus, the second conversion is performed by converting H(i, j) to I(i, j) with the following equation having a sigmoid function.
I=Gmax/{1+exp(−β(H−α))} (3)
This sigmoid function has a good linearity for H being around α. Therefore, the threshold value α is set to the most frequent value, e.g., 80 in the frequency distribution of the values of H having edge information.
The image conversion section 22 converts the picture of pixel values G(i, j) to a picture of pixel values I(i, j) based on the above equations (2) and (3).
Each of a background picture generation section 24, an ID generation/deletion section 25, and a moving object tracking section 27 performs a process based on spatial differential frame pictures in the image memory 21. Hereinafter, for abbreviation, the spatial differential frame picture is referred to as a frame picture.
The background picture generation section 24 is provided with a storage section and a processing section. This processing section accesses the image memory 21 to prepare histograms of respective pixels, each histogram having corresponding pixel values of all the frame pictures, for example, over the past 10 minutes to generate a picture with no moving object therein as a background picture whose each pixel value is the mode of the corresponding histogram, and to store the background picture into the storage section. This processing is repeated periodically to update the background picture.
As shown in
The ID generation/deletion section 25 assigns a new cluster identification codes ID to a block when it is determined that a moving object exists in the block. When it is determined that a moving object exists in a block adjacent to another block to which ID has been assigned, ID generation/deletion section 25 assigns the same ID as assigned to said another block to the adjacent block. Said another block to which ID has been assigned may be one adjacent to an entrance slit. For example, in
Assignment of ID is performed to corresponding blocks in an object map storage section 26. In the above case, the object map storage section 26 is for storing object maps each having 60×80 blocks in the above case, and each block has, as block information, a flag indicating whether or not an ID has been assigned, and an ID number and a block motion vector described later when the ID has been assigned. Note that ID=0 may be used for indicating no assignment of ID, without using the flag. Further, the most significant bit of ID may be the flag.
For each cluster having passed through an entrance slit, the moving object tracking section 27 assigns the same ID to blocks located in a moving direction, and deletes the same ID of blocks located in a direction opposite to the movement, that is, performs a tracking process for each cluster. The moving object tracking section 27 performs the tracking process for each cluster as far as it is within an exit slit.
The ID generation/deletion section 25 further checks whether or not an ID is assigned to the blocks in the exit slits EX1 to EX4 on the basis of contents of the object map storage section 26 and, if assigned, deletes the ID when the cluster having the ID has passed though an exit slit. For example, when transition has been performed from a state where an ID is assigned to blocks in the exit slit EX1 in
The moving object tracking section 27, as described later, generates an object map at a time t on the basis of an object map and a frame picture at a time (t−1) and a frame picture at the time t.
In a case where moving objects are shot from the front thereof at a low camera angle with respect to a road surface in order to shoot a wide area with one camera to track the moving objects, it is frequent that the moving objects are overlapped with each other in pictures as shown in (A) to (C) of
A block in the i-th row and j-th column at a time t is denoted by B(t: i, j). As shown in
Next, the motion vectors V2 and V3 are translated such that the tips of the motion vectors V2 and V3 both coincide with the center of the block B (18, 11). Then, the translated motion vectors V2 and V3 are inversed to obtain −V2 and −V3, respectively, as shown in
An evaluation value UD associated with a similarity (correlation) between the image in the box SBR2 of
UD=Σ|SP(t−1:i,j)−BP(t:i,j)| (4)
where SP(t−1 : i, j) and BP(t:i, j) denote pixel values on the i-th row and the j-th column in the box SBR2 of
Likewise, an evaluation value UD associated with a correlation between the image in the block SBR3 of
In the case of
In such a way, by using a motion vector of each block, different IDs can be assigned to blocks included in the cluster C123 including a plurality of moving objects at the time t, and thereby one cluster C123 can be divided into subclusters with different IDs.
How to find out the block B(t−1: 11, 13) in the cluster C12 and the block B(t−1: 14, 13) in the cluster C3 both corresponding to the block B (t: 18, 11) belonging to the cluster C123 of
|V(18−i,11−j)−V(t−1:i,j)|<ΔV
where ΔV is a constant whose value is, for example, three times the number of pixels on one side of a block. In a case where a plurality of blocks corresponding to the block B(t: 18, 11) exist in the cluster C12 or in a case where a plurality of blocks corresponding to the block B(t: 18, 11) exist in the cluster C3, the evaluation value is determined for each of such blocks and ID corresponding to the least evaluation value is assigned to the block B(t: 18, 11).
The above procedure is applied to other blocks belonging to the cluster C123 of
In the above case where ID3 is assigned to the block B(t: 18, 11), the motion vector of the block can be estimated to be almost equal to the vector V3. In order to obtain the motion vector of the block B(t: 18, 11) more accurately, the box SBR3 is shifted by one pixel at a time in a predetermined range whose center is coincident with that of the box SBR3, the evaluation value is obtained for every shift, and the motion vector of the block B(t: 18, 11) is determined to be a vector whose origin is the center of the box SBR3 when the evaluation value assumes the minimum (the highest correlation) and whose tip is the center of the block B(t: 18, 11). A motion vector of a block at a time t is determined by such a block matching each time when ID is assigned to the block.
In order to estimate the similarity more accurately, amounts described below are also taken into consideration.
A portion of the box SBR3 in
US=(S(t−1)−64))2 (5)
The smaller the evaluation value US is, the higher the correlation is. Likewise, assuming that ID of the block B(t: 18, 11) is equal to ID12, the number S of pixels belonging to the cluster C12 in the box SBR2 is determined to calculate the evaluation value US, and the value is denoted by US(ID12). In cases of
U=aUD+bUS which is a linear combination of the above equations (4) and (5) is defined as an evaluation function, and it is determined that the smaller the evaluation value U, the higher the similarity is, where a and b are positive constants and determined empirically such that the evaluation of similarity becomes more correct.
For each block of
UN=(N(t)−8)2. (6)
The smaller the evaluation value UN is, the higher the correlation is. Similarly, in a case where it is assumed that ID of the block B(t: 18, 11) of
Further, when the error of a motion vector at the time (t−1) determined by block matching is large since almost the same pixel values are distributed, a case can be considered where the absolute value |U(ID12)−U(ID3)| of a difference in evaluation values of the linear combination U=aUD+bUS+cUN of the above equations (4) to (6) is less than a predetermined value, where c is also positive constant. Therefore, by paying attention to motion vectors of blocks in the neighborhood of blocks B(t−1: 14, 13) and B(t−1: 11, 13) corresponding to the block (t: 18, 11), the evaluation of similarity is made more correct. That is, an evaluation value UV is calculated using the following equation which includes a motion vector VC(t)=V3 of the block B(t−1: 14, 13) from time (t−1) to t corresponding to the B(t: 18, 11) on the assumption that the block B(t: 18, 11) has ID3; and motion vectors VBi(t−1), for i=1 to NX and NX=NX3, of blocks (blocks each attached with a small black dot in
UV=Σ|VC(t)−VBi(t−1)|2/NX (7)
where Σ denotes a sum over i=1 to NX. The smaller the evaluation value UV is, the higher the correlation is. Similarly, an evaluation value UV is calculated in regard to a motion vector VC=V2 of the block B(t−1: 11, 13) from time (t−1) to t corresponding to the B(t: 18, 11) on the assumption that the block B(t: 18, 11) has ID12; and motion vectors VBj(t−1), for j=1 to NX and NX=NX12, of blocks (blocks each attached with a mark x in
A linear combination of the above equations (4) to (7)
U=aUD+bUS+cUN+fUV (8)
is used as the evaluation function and it is determined that the smaller the evaluation value is, the higher the similarity is, where f is also positive constants and determined empirically such that the evaluation of similarity becomes more correct.
In such a way, not only is it determined whether ID is ID12 or ID3 in each block in the cluster C123 of
Similarly, an object map at a time (t+1) can be determined from the frame picture and object map at the time t and a frame picture at the time (t+1). Since C12 and C3 are discriminated from each other at the time t and the moving object M1 is separated from the moving object M2 in the frame picture at the time (t+1), as shown in
Note that an object map X at a time (t−1) may be copied into a work area as an object map Y prior to preparation of an object map at a time t, and a motion vector Vi of each block i whose ID is equal to IDα in the object map X may be replaced with a mean vector (ΣVj)/p, for j=1 to p, in the object map X, where V1 is a motion vector of a block corresponding to the block i and Vj for j=2 to p are motion vectors of blocks having ID=IDα and adjacent to the corresponding block. With such a procedure, in a case where errors of motion vectors are large since a texture in a block is similar to those in adjacent blocks, the errors are reduced. Copying to the work area is for uniquely determining the mean vector.
It is also possible to determine the object map at a time t by the step of:
assuming that a block i in the object map at the time t, for example, block B(18, 11) of
as a unit correlation, calculating a correlation between the image of block i in the frame picture at a time t and the image in a box obtained by moving a box of the block i by a vector −Vi in the frame picture at a time (t−1), for example, the image in SBR3 of
determining a total sum of unit correlations over the entire blocks at the time t to produce the object map at the time t so that the total sum is maximized.
This algorithm is also effective when t is swapped with (t−1). In this case, when moving objects which have been overlapped are discriminated from each other, the overlapped moving objects can be discriminated from each other by reversing the time.
Although a preferred embodiment of the present invention has been described, it is to be understood that the invention is not limited thereto and that various changes and modifications may be made without departing from the spirit and scope of the invention.
For example, instead of the sigmoid function of the equation (3), a simple line function as shown in
Number | Date | Country | Kind |
---|---|---|---|
2002-062439 | Mar 2002 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5210799 | Rao | May 1993 | A |
5838828 | Mizuki et al. | Nov 1998 | A |
6005493 | Taniguchi et al. | Dec 1999 | A |
6188778 | Higashikubo et al. | Feb 2001 | B1 |
20040131233 | Comaniciu et al. | Jul 2004 | A1 |
Number | Date | Country |
---|---|---|
0 420 657 | Apr 1991 | EP |
0 505 858 | Sep 1992 | EP |
0 807 914 | Nov 1997 | EP |
2001-148019 | May 2001 | JP |
WO 0031706 | Jun 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20030169340 A1 | Sep 2003 | US |