Embodiments described herein relate generally to a moving object tracking system and a moving object tracking method.
A moving object tracking system, for example, detects moving objects included in frames in a time series of images, and matches identical moving objects between frames, thereby tracking a moving object. This moving object tracking system may record a tracking result of a moving object or identifies a moving object in accordance with the tracking result. That is, the moving object tracking system tracks a moving object, and conveys a tracking result to an observer.
The following three techniques have been suggested as the main techniques for tracking a moving object.
According to the first tracking technique, a graph is created from the result of a detection of adjacent frames, and a problem for finding matching is formulated as a combinational optimization problem (problem of assignment on a bipartite graph) that maximizes a proper evaluation function, such that objects are tracked.
According to the second tracking technique, in order to track an object even when there are frames in which the moving object cannot be detected, information on the surroundings of the object is used to complement a detection. A concrete example is a technique that uses, in face tracking processing, information on the surroundings, for example, the upper part of the body.
According to the third tracking technique, an object is detected in advance in all frames of moving images, and the frames are linked together to track objects.
Furthermore, the following two methods have been suggested to manage tracking results.
The first tracking result managing method performs matching so that the moving objects can be tracked at intervals. According to the second managing method, a head region is detected and kept tracked even when the face of a moving object is invisible in a technique for tracking and recording a moving object. If there is a great pattern variation after the moving object is kept tracked as the identical person, records are separately managed.
However, the above-described conventional techniques have the following problems.
First, according to the first tracking technique, matching is performed only by the detection result of the adjacent frames, so that the tracking is interrupted when there are frames in which detections are unsuccessful during the movement of the object. The second tracking technique has been suggested as a technique for tracking the face of a person and uses information on the surroundings, for example, on the upper part of the body to cope with an interrupted detection. However, the problem of the second tracking technique is that this technique requires means which is adapted to detect parts other than a face and which is not adapted to the tracking of more than one object. According to the third tracking technique, a tracking result has to be output after all the frames containing the target object are input in advance. Moreover, the third tracking technique is adapted to false positive (erroneous detection of an object which is not targeted for tracking), but is not adapted to interrupted tracking caused by false negative (not being able to detect an object which is target for tracking).
Moreover, the first tracking result managing method is a technique for processing the tracking of objects in a short time, and is not intended to improve the accuracy or reliability of a track processing result. According to the second tracking result managing method, one of the results of tracking persons is only output as an optimum tracking result. However, according to the second tracking result managing method, unsuccessful tracking attributed to the problem of tracking accuracy is recorded as an improper tracking result, and proportionate candidates cannot be recorded or an output result cannot be controlled depending on a result.
In general, according to one embodiment, a moving object tracking system includes an input unit, a detection unit, a creating unit, a weight calculating unit, a calculating unit, and an output unit. The input unit inputs time-series images captured by a camera. The detection unit detects all tracking target moving objects from each of the input images input by the input unit. The creating unit creates a combination of a path that links each moving object detected in a first image by the detection unit to each moving object detected in a second image following the first image, a path that links each moving object detected in the first image to an unsuccessful detection in the second image, and a path that links an unsuccessful detection in the first image to each moving object detected in the second image. The weight calculating unit calculates a weight for each path created by the creating unit. The calculating unit calculates a value for the combination of the paths to which the weights calculated by the weight calculating unit are allocated. The output unit outputs a tracking result based on the value for the combination of the paths calculated by the calculating unit.
Hereinafter, first, second, third, and fourth embodiments will be described in detail with reference to the drawings.
A system according to each embodiment is a moving object tracking system (moving object monitoring system) for detecting a moving object from images captured by a large number of cameras and tracking (monitoring) the detected moving object. In each embodiment, a person tracking system for tracking the movement of a person (moving object) is described as an example of a moving object tracking system. However, the person tracking system according to each of the later-described embodiments can also be used as a tracking system for tracking moving objects other than a person (e.g., a vehicle or an animal) by changing the processing for detecting the face of a person to detection processing suited to a moving object to be tracked.
The system shown in
The system having the configuration shown in
The person tracking system shown in
In each of the embodiments described below, when faces of more than one person are contained in pictures (time-series images, or moving images comprising frames) obtained by the cameras, the person tracking system as the moving object tracking system tracks each of the persons (faces). Alternatively, the system described in each embodiment is a system for detecting, for example, a moving object (e.g., person or vehicle) from a large number of pictures collected by a large number of cameras, and recording the detection results (scenes) in a recording device together with the tracking results. The system described in each embodiment may otherwise be a monitor system for tracking a moving object (e.g., the face of a person) detected from an image captured by a camera, and collating the feature amount of the tracked moving object (the face of the subject) with dictionary data (the feature amount of the face of a registrant) previously registered on a database (face database) to identify the moving object, and then reporting the identification result of the moving object.
First, the first embodiment is described.
The person tracking system (moving object tracking system) described in the first embodiment tracks, as a detection target, the face of a person (moving object) detected from images captured by cameras, and records a tracking result in a recording device.
The person tracking system shown in
Each of the cameras 1 photographs a monitor area allocated thereto. The terminal devices 2 process images captured by the cameras 1. The server 3 generally manages results of processing in the respective terminal devices 2. The monitor device 4 displays the processing results managed by the server 3. There may be more than one server 3 and more than one monitor device 4.
In the configuration example shown in
Each of the terminal devices 2 (2A, 2B) comprises a control unit 21, an image interface 22, an image memory 23, a processing unit 24, and a network interface 25.
The control unit 21 controls the terminal device 2. The control unit 21 comprises, for example, a processor which operates in accordance with a program, and a memory in which the program to be executed by the processor is stored. That is, the processor executes the program in the memory so that the control unit 21 achieves various kinds of processing.
The image interface 22 is an interface for inputting time-series images (e.g., moving images in predetermined frame units) from the cameras 1. When the camera 1 and the terminal device 2 are connected via the communication line 5, the image interface 22 may be a network interface. The image interface 22 also functions to digitize (A/D converts) the image input from the camera 1 and supply the digitized images to the processing unit 24 or the image memory 23. For example, the image captured by the camera and acquired by the image interface 22 is stored in the image memory 23.
The processing unit 24 processes the acquired image. For example, the processing unit 24 comprises a processor which operates in accordance with a program, and a memory in which the program to be executed by the processor is stored. As processing functions, the processing unit 24 comprises a face detecting unit 26 which detects a region of a moving object (the face of a person) if any, and a face tracking unit 27 which tracks the identical moving object to match the movements between the input images. These functions of the processing unit 24 may be obtained as functions of the control unit 21. Moreover, the face tracking unit 27 may be provided in the server 3 that can communicate with the terminal device 2.
The network interface 25 is an interface for communicating via the communication line (network). Each of the terminal devices 2 performs data communication with the server 3 via the network interface 25.
The server 3 comprises a control unit 31, a network interface 32, a tracking result managing unit 33, and a communication control unit 34. The monitor device 4 comprises a control unit 41, a network interface 42, a display unit 43, and an operation unit 44.
The control unit 31 controls the whole server 3. The control unit 31 comprises, for example, a processor which operates in accordance with a program, and a memory in which the program to be executed by the processor is stored. That is, the processor executes the program stored in the memory so that the control unit 31 achieves various kinds of processing. For example, the processor may execute the program in the control unit 31 of the server 3 to obtain a processing function similar to the face tracking unit 27 of the terminal device 2.
The network interface 32 is an interface for communicating with each of the terminal devices 2 and the monitor device 4 via the communication line 5. The tracking result managing unit 33 comprises a storage unit 33a, and a control unit for controlling the storage unit. The tracking result managing unit 33 stores, in the storage unit 33a, a tracking result of the moving object (the face of the person) acquired from each of the terminal devices 2. Not only information indicating the tracking results but also images captured by the cameras 1 are stored in the storage unit 33a of the tracking result managing unit 33.
The communication control unit 34 controls communications. For example, the communication control unit 34 adjusts a communication with each of the terminal devices 2. The communication control unit 34 comprises a communication measurement unit 37 and a communication setting unit 36. The communication measurement unit 37 finds a communication load such as a communication amount on the basis of the number of cameras connected to each of the terminal devices 2 or the amount of information such as the tracking result supplied from each of the terminal devices 2. The communication setting unit 36 sets the parameter of information to be output as a tracking result to each of the terminal devices 2 on the basis of the communication amount measured by the communication measurement unit 37.
The control unit 41 controls the whole monitor device 4. The network interface 42 is an interface for communicating via the communication line 5. The display unit 43 displays, for example, the tracking result supplied from the server 3 and the images captured by the cameras 1. The operation unit 44 comprises, for example, a keyboard or mouse to be operated by an operator.
Now, the configuration and processing in each unit of the system shown in
Each of the cameras 1 takes images of the monitor area. In the configuration example shown in
The face detecting unit 26 performs processing to detect all faces (one or more faces) present in the input images. The following techniques can be applied as specific processing method for detecting faces. First, a prepared template is moved in an image to find a correlation value so that a position providing the highest correlation value is detected as the region of the face image. Otherwise, faces can be detected by a face extraction method that uses an eigenspace method or a subspace method. The accuracy of the face detection can be increased by detecting the position of a facial part such as an eye or a nose from the detected region of the face image. To such a face detection method, it is possible to apply a technique described in, for example, a document (Kazuhiro Hukui and Osamu Yamaguchi: “Facial Feature Point Extraction Method Based on Combination of Shape Extraction and Pattern Matching”, the journal of the Institute of Electronics, Information and Communication Engineers (D), vol. J80-D-II, No. 8, pp. 2170-2177 (1997). For the above-mentioned eye or nose detection and the detection of a mouth region, it is possible to use a technique according to a document (Mayumi Yuasa and Akiko Nakashima: “Digital Make System based on High-Precision Facial Feature Point Detection”, 10th Image Sensing Symposium Proceedings, pp. 219-224 (2004). Both of the techniques acquire information that can be dealt with as two-dimensionally arranged images and detect a face feature region from the information.
In the above-described processing, in order to only extract one face feature from one image, it is possible to find values of the correlation of all the images with the template, and output a position and a size that maximize the values. In order to extract more than one face feature, it is possible to find a local maximum value of a correlation value of the whole image, narrow down face candidate positions in consideration of overlap in one image, and finally find more than one face feature in consideration of the relation (time shift) with sequentially input past images.
The face tracking unit 27 performs processing to track the face of a person as a moving object. For example, a technique described in detail in the later third embodiment can be applied to the face tracking unit 27. The face tracking unit 27 integrates and optimally matches information such as the coordinates or size of the face of the person detected from the input images, and integrally manages and outputs, as a tracking result, the result of the matching of the identical persons throughout frames.
There is a possibility that the face tracking unit 27 may not determine a single result (tracking result) of the matching of persons in the images. For example, when there is more than one person moving around, there may be complicated movements such as crossing of the persons, so that the face tracking unit 27 obtains more than one tracking result. In this case, the face tracking unit 27 can not only output a result having the strongest likelihood in the matching as a first candidate but also manage the proportionate matching results.
The face tracking unit 27 also functions to calculate a reliability of a tracking result. The face tracking unit 27 can select a tracking result to be output, on the basis of the reliability. The reliability is determined in consideration of information such as the number of obtained frames and the number of detected faces. For example, the face tracking unit 27 can set a numerical value of reliability on the basis of the number of frames in which tracking is successful. In this case, the face tracking unit 27 can decrease the reliability of a tracking result indicating that only a small number of frames can be tracked.
The face tracking unit 27 may otherwise combine more than one standard to calculate a reliability. For example, when the similarity of a detected face image is available, the face tracking unit 27 can set the reliability of a tracking result showing a small number of frames in which tracking is successful but showing a high average similarity of face images to be higher than the reliability of a tracking result showing a large number of frames in which tracking is successful but showing a low average similarity of face images.
Note that in
First, suppose that the face tracking unit 27 has acquired N time-series face detection results (X1, . . . , Xn) as face detection results (step S1). The face tracking unit 27 then judges whether the number N of the face detection results is greater than a predetermined number T (e.g., one) (step S2). When the number of the face detection results N is equal to or less than the predetermined number T (step S2, NO), the face tracking unit 27 sets the reliability to 0 (step S3). When judging that the number of the face detection results N is greater than the predetermined number T (step S2, YES), the face tracking unit 27 initializes a replication number (variable) t and a reliability r(X) (step S4). In the example shown in
When the replication number (variable) t and the reliability r(X) are initialized, the face tracking unit 27 ascertains that the replication number t is smaller than the number of the face detection results N (step S5). That is, when t<N (step S5, YES), the face tracking unit 27 calculates a similarity S(t, t+1) between Xt and Xt+1 (step S6). Further, the face tracking unit 27 calculates a movement amount D(t, t+1) of Xt and Xt+1, and a size L(t) of Xt (step S7).
In accordance with the similarity S (t, t+1), the movement amount D(t, t+1), and the L(t), the face tracking unit 27 calculates (updates) the reliability r(X) in the following manner.
If S(t, t+1)>θs, and if D(t, t+1)/L(t)<θd, then r(X)←r(X)*α,
If S(t, t+1)>θs, and if D(t, t+1)/L(t)>θd, then r(X)←r(X)*β,
If S(t, t+1)<θs, and if D(t, t+1)/L(t)<θd, then r(X)←r(X)*γ,
If S(t, t+1)<θs, and if D(t, t+1)/L(t)>θd, then r(X)←r(X)*δ.
After calculating (updating) the reliability r(X), the face tracking unit 27 increments the replication number t (t=t+1) (step S9), and returns to step S5. For the individual face detection results (scenes) X1, . . . , Xn, reliabilities corresponding to the similarity S(t, t+1), the movement amount D(t, t+1), and the L(t) may also be calculated. However, the reliability of the whole tracking result is calculated here.
By repetitively performing the processing in steps S5 to S9, the face tracking unit 27 calculates reliabilities of the tracking results comprising the acquired N time-series face detection results. That is, when judging in step S5 that t is not less than N (step S2, NO), the face tracking unit 27 outputs the calculated reliability r(X) as the reliability of the tracking result for the N time-series face detection results (step S10).
In the processing example described above, the tracking result is a time series of face detection results. Specifically, each of the face detection results is made up of a face image and information on the position in the image. The reliability is a numerical value of 0 or more and 1 or less. The reliability is set so that the similarity may be high when faces between adjacent frames are compared and so that the reliability of the tracking result may be high when the movement amount is not great. For example, when person detection results are mixed, the similarity is decreased if a similar comparison is made. In the reliability calculation processing described above, the face tracking unit 27 determines the degree of similarity and the amount of movement by comparing with preset thresholds. For example, when a tracking result includes a set of images that are low in similarity and great in the movement amount, the face tracking unit 27 multiplies the reliability by the parameter δ to decrease the value of the reliability.
As shown in
When a “reliability of 70% or more” is set for the tracking result shown in
For example, an input image and a tracking result may be output as data for one tracking result candidate. As data for one tracking result candidate, an image (face image) which is a cutout image of a part located in the vicinity of the detected moving object (face) may be output in addition to the input image and the tracking result. In addition to such information, all the images that can be regarded as containing the identical moving object (face) and thus matched with one another (a predetermined reference number of images selected from the matched images) may be selectable in advance. In order to set these parameters (to set data to be output as one tracking result candidate), parameters designated by the operation unit 44 of the monitor device 4 may be set in the face tracking unit 27.
The tracking result managing unit 33 manages, on the server 3, the tracking results acquired from the terminal devices 2. The tracking result managing unit 33 of the server 3 acquires the above-described data for the tracking result candidate from each of the terminal devices 2, and records the data for the tracking result candidate acquired from the terminal device 2 in the storage unit 33a, and thus manages the data.
The tracking result managing unit 33 may collectively record the pictures captured by the cameras 1 as moving images in the storage unit 33a. Alternatively, only when a face is detected or only when a tracking result is obtained, the tracking result managing unit 33 may record pictures of this portion as moving images in the storage unit 33a. Otherwise, the tracking result managing unit 33 may only record the detected face region or person region in the storage unit 33a, or may only record, in the storage unit 33a, the best images judged to be most easily seen among tracked frames. In the present system, the tracking result managing unit 33 may receive more than one tracking result. Thus, the tracking result managing unit 33 may manage and store, in the storage unit 33a, the place of the moving object (person) in each frame, identification ID indicating the identity of the moving object, and the reliability of the tracking result, in such a manner as to match with the moving images captured by the cameras 1.
The communication setting unit 36 sets parameters for adjusting the amount of data as the tracking result acquired by the tracking result managing unit 33 from each terminal device. The communication setting unit 36 can set one or both of, for example, “a threshold of the reliability of the tracking result” or “the maximum number of tracking result candidates”. Once these parameters are set, the communication setting unit 36 can set each terminal device to transmit a tracking result having a reliability equal to or more than the set threshold when more than one tracking result candidate is obtained as a result of the tracking processing. When there is more than one tracking result candidate as a result of the tracking processing, the communication setting unit 36 can also set, for each terminal device, the number of candidates to be transmitted in descending order of reliability.
Furthermore, the communication setting unit 36 may set parameters under the instruction of the operator, or may dynamically set parameters on the basis of the communication load (e.g., communication amount) measured by the communication measurement unit 37. In the former case, the operator may use the operation unit to set parameters in accordance with an input value.
The communication measurement unit 37 monitors, for example, data amounts sent from the terminal devices 2 and thereby measures the state of the communication load. In accordance with the communication load measured by the communication measurement unit 37, the communication setting unit 36 dynamically changes the parameter for controlling the tracking result to be output to each of the terminal devices 2. For example, the communication measurement unit 37 measures the volume of moving images sent within a given period of time, or the amount of tracking results (communication amount). Thus, in accordance with the communication amount measured by the communication measurement unit 37, the communication setting unit 36 performs setting to change the output standard of the tracking result for each of the terminal devices 2. That is, in accordance with the communication amount measured by the communication measurement unit 37, the communication setting unit 36 changes the reference value for the reliability of the face tracking result output by each of the terminal devices, or adjusts the maximum number (the number N set to allow high N results to be sent) of transmitted tracking result candidates.
That is, when the communication load is high, data (data for the transmitted tracking result candidates) acquired from each of the terminal devices 2 has to be minimized in the whole system. In such a situation, the present system can be adapted to only output highly reliable tracking results or reduce the number of output tracking result candidates in accordance with the measurement result by the communication measurement unit 37.
That is, in the communication control unit 34, the communication setting unit 36 judges whether the communication setting of each of the terminal devices 2 is automatic setting or manual setting by the operator (step S11). When the operator has designated the contents of the communication setting of each of the terminal devices 2 (step S11, NO), the communication setting unit 36 determines parameters for the communication setting of each of the terminal devices 2 in accordance with the contents designated by the operator, and sets the parameters in each of the terminal devices 2. That is, when the operator manually designates the contents of the communication setting, the communication setting unit 36 performs the communication setting in accordance with the designated contents regardless of the communication load measured by the communication measurement unit 37 (step S12).
When the communication setting of each of the terminal devices 2 is automatic setting (step S11, YES), the communication measurement unit 37 measures the communication load in the server 3 attributed to the amount of data supplied from each of the terminal devices 2 (step S13). The communication setting unit 36 judges whether the communication load measured by the communication measurement unit 37 is equal to or more than a predetermined reference range (i.e., whether the communication state is a high-load communication state) (step S14).
When the communication load measured by the communication measurement unit 37 is judged to be equal to or more than the predetermined reference range (step S14, YES), the communication setting unit 36 judges a parameter for a communication setting that restrains the amount of data output from each of the terminal devices in order to lessen the communication load (step S15).
For example, in the example described above, to lessen the communication load, it is possible to provide a setting that raises the threshold for the reliability of a tracking result candidate to be output, or a setting that reduces the set maximum number of output tracking result candidates. When the parameter for lessening the communication load (parameter for restraining output data from the terminal devices) is determined, the communication setting unit 36 sets the determined parameter in each of the terminal devices 2 (step S16). Thus, the amount of data output from each of the terminal devices 2 is reduced, so that the communication load can be reduced in the server 3.
When the communication load measured by the communication measurement unit 37 is judged to be less than the predetermined reference range (step S17, YES), more data can be acquired from each of the terminal devices, so that the communication setting unit 36 judges a parameter for a communication setting that lessen the amount of data output from each of the terminal devices (step S18).
For example, in the example described above, it is possible to provide a setting to drop the threshold for the reliability of a tracking result candidate to be output, or a setting to increase the set maximum number of output tracking result candidates. When the parameter expected to increase the amount of supplied data (parameter for lessening the data output from the terminal devices) is determined, the communication setting unit 36 sets the determined parameter in each of the terminal devices 2 (step S19). Thus, the amount of data output from each of the terminal devices 2 is increased, so that more data is obtained in the server 3.
According to the communication setting processing described above, in the automatic setting, the server can adjust the amount of data from each of the terminal devices depending on the communication load.
The monitor device 4 is a user interface comprising the display unit 43 for displaying the tracking results managed by the tracking result managing unit 33 and images corresponding to the tracking results, and the operation unit 44 for receiving the input from the operator. For example, the monitor device 4 can comprise a PC equipped with a display section and a keyboard or a pointing device, or a display device having a touch panel. That is, the monitor device 4 displays the tracking results managed by the tracking result managing unit 33 and images corresponding to the tracking results in response to a request from the operator.
When there is more than one tracking result candidate, the monitor device 4 displays, in a guide screen B, the face that there is more than one tracking result candidate, and displays, as a list, icons C1 and C2 for the operator to select the tracking result candidates. If the operator selects the icon of a tracking result candidate, tracking may be performed in accordance with the tracking result candidate of the selected icon. Moreover, when the operator selects the icon of a tracking result candidate, a tracking result corresponding to the icon selected by the operator is then displayed for the tracking result for this time.
In the example shown in
In the example shown in
According to the display screen shown in
The person tracking system according to the first embodiment described above can be applied to a moving object tracking system which detects and tracks a moving object in a monitored picture and records the image of the moving object. In the moving object tracking system according to the first embodiment described above, the reliability of tracking processing for a moving object is found. One tracking result is output for a highly reliable tracking result. For a low reliability, pictures can be recorded as tracking result candidates. Consequently, in the moving object tracking system described above, a recorded picture can be searched for afterwards, and at the same time, a tracking result or a tracking result candidate can be displayed and selected by the operator.
Now, the second embodiment is described.
The system according to the second embodiment tracks, as a detection target (moving object), the face of a person photographed by monitor cameras, recognizes whether the tracked person corresponds to previously registered persons, and records the recognition result in a recording device together with the tracking result. The person tracking system according to the second embodiment shown in
In the configuration example of the person tracking system shown in
In the person tracking system according to the present embodiment, the person identifying unit 38 calculates the feature information for identifying a person by using image groups that are judged to show the identical person on the basis of a face-containing image managed by the tracking result managing unit 33 and a tracking result (coordinate information) for the person (face). This feature information is calculated, for example, in the following manner. First, a piece such an eye, a nose or a mouth is detected in a face image. A face region is cut into a shape of a given size in accordance with the position of the detected piece. Thickness information for the cut portion is used as a feature amount. For example, a thickness value of a region of m pixels×n pixels region is directly used as feature vectors comprising m×n−dimensional information. The feature vectors are normalized by a method called a simple similarity method so that a vector and the length of the vector may respectively be 1s, and an inner product is calculated to find a similarity degree that indicates the similarity between the feature vectors. Feature extraction is thus completed in the case of processing that uses one image to derive a recognition result.
However, more accurate recognizing processing can be performed by a moving-imaged-based calculation that uses sequential images. Thus, this technique is considered in the description of the present embodiment. That is, an image comprising m×n pixels is cut out of sequentially obtained input images as in the case of the feature extraction means. A correlation matrix of feature vectors is found from the data, and an orthonormal vectors by KL expansion is found. Thereby, a subspace representing the features of a face obtained from the sequential images is calculated.
In order to calculate the subspace, a correlation matrix (or covariance matrix) of feature vectors is found, and an orthonormal vectors (eigenvectors) by the KL expansion of the matrix is found. The subspace is represented by selecting k eigenvectors corresponding to eigenvalues in descending order of eigenvalue and using the set of the eigenvectors. In the present embodiment, a correlation matrix Cd is found from feature vectors and diagonalized to a correlation matrix Cd=φdΛdφdT, thereby finding a matrix φ of the eigenvectors. This information serves as the subspace that represents the features of the face of the person currently targeted for recognition. The above-described processing for calculating the feature information may be performed in the person identifying unit 38, but may otherwise be performed in the face tracking unit 27 on the camera side.
Although more than one frame is used to calculate the feature information according to the technique in the embodiment described above, it is also possible to use a recognizing method which selects one or more frames that seem to be most suitable for the recognizing processing from frames obtained by tracking a person. In this case, a frame selecting method using any index may be used as long as the index shows the change of face conditions; for example, the directions of a face are found and a nearly full-faced frame is preferentially selected, or a frame showing the face in a greatest size is selected.
Furthermore, whether a previously registered person is present in a current image can be judged by comparing the similarity between an input subspace obtained by the feature extraction means and previously registered one or more subspaces. A subspace method or a multiple similarity method may be used as a calculation method for finding the similarity between subspaces. For the recognizing method in the present embodiment, it is possible to use a mutual subspace method described in, for example, a document (Kenichi Maeda and Sadakazu Watanabe: “Pattern Matching Method with Local Structure”, the journal of the Institute of Electronics, Information and Communication Engineers (D), vol. J68-D, No. 3, pp. 345-352 (1985)). According to this method, both recognition data in prestored registered information and input data are represented as subspaces calculated from images, and an “angle” between the two subspaces is defined as a similarity. The subspace input here is referred to as an input means subspace. A correlation matrix Cin is likewise found for an input data row, and diagonalized to Cin=φinΛinφinT, thereby finding an eigenvector φin. An inter-subspace similarity (0.0 to 1.0) between the two subspaces represented by φin and φd is found, and used as a similarity for the recognition.
When there is more than one face in an image, similarities to the feature information for the face images registered in the person information managing unit 39 are calculated in order in a round-robin manner, such that results for all the persons can be obtained. For example, if there are dictionaries for Y persons when X persons are walking, results for all of the X persons can be output by performing X×Y similarity calculations. When a recognition result cannot be output by calculation results in which m images are input (when the person is not judged to be any of the registered persons and a next frame is acquired to perform a calculation), the correlation matrix input to the above-mentioned subspace corresponding to one frame is added to the sum of correlation matrixes created by past frames, and the calculation of an eigenvector and the creation of a subspace are again conducted, such that the subspace on the input side can be updated. That is, to sequentially take and collate images of the face of a walker, images are acquired one by one to update the subspace simultaneously with a collation calculation, thereby enabling a calculation gradually increasing in accuracy.
When tracking results of the same scene are managed in the tracking result managing unit 33, more than one person identification result can be calculated. Whether to perform the calculation may be directed by the operator using the operation unit 44 of the monitor device 4. Alternatively, results may be always obtained, and necessary information may be selectively output in response to an instruction from the operator.
The person information managing unit 39 manages, person by person, the feature information obtained from an input image to recognize (identify) a person. Here, the person information managing unit 39 manages, as a database, the feature information created by the processing described in connection with the person identifying unit 38. The present embodiment assumes the same m×n feature vectors after feature extraction as the feature information obtained from an input image. However, face images before feature extraction may be used, and a subspace to be used or a correlation matrix immediately before KL expansion may be used. These are stored by using, as a key, a personal ID number for personal identification. Here, one piece of face feature information may be registered for one person, or feature information for more than one face may be retained to be available to recognition simultaneously with switching depending on the situation.
Similarly to the monitor device 4 described in the first embodiment, the monitor device 4 displays the tracking results managed by the tracking result managing unit 33 and images corresponding to the tracking results.
That is, in the example shown in
If a face image of one person displayed in the history display section H is selected, the selected input image is displayed in input image section I that shows the face image of the person targeted for identification. The input image sections I are displayed side by side in a person search result section J. A list of registered face images similar to the face images displayed in the input image sections I is displayed in the search result section J. The face images displayed in the search result section J are registered face images similar to the face images displayed in the input image sections I among face images of persons registered in the person information managing unit 39 in advance.
Although the list of face images to be candidates for the person corresponding to the input image is shown in the example shown in
Furthermore, in the example shown in
When there is more than one tracking result, the fact that there is more than one tracking result candidate is displayed in a guide screen L, and a list of icons M1 and M2 for the operator to select the tracking result candidates is displayed. If the operator selects any one of the icons M1 and M2, the contents of the face images and moving images displayed in the above-mentioned person search section may be set to be updated in accordance with the tracking result corresponding to the selected icon. The reason is that the image group used for a search may vary with varying tracking results. Even when the tracking result may change, the operator can visually check tracking result candidates in the display example shown in
The pictures managed in the tracking result managing unit can be searched for similarly to the pictures described in the first embodiment.
As described above, the person tracking system according to the second embodiment can be applied as a moving object tracking system for detecting and tracking a moving object in observation pictures captured by the cameras and comparing the tracked moving object with previously registered information and thereby identifying the moving object. In the moving object tracking system according to the second embodiment, a reliability of tracking processing for a moving object is found. For a highly reliable tracking result, identifying processing for the tracked moving object is performed by one tracking result. For a low reliability, identifying processing for the tracked moving object is performed by more than one tracking result.
Thus, in the moving object tracking system according to the second embodiment, a person can be identified from an image group based on tracking result candidates when an erroneous tracking result is easily made, for example, when the reliability is low. Accordingly, information (a moving object tracking result and a moving object identification result) regarding the tracked moving object can be correctly displayed in an easily recognizable manner to the manager or operator of the system at the place where the pictures are captured.
Now, the third embodiment is described.
The third embodiment includes processing that can be applied to the processing in the face tracking unit 27 of the person tracking system described above in the first and second embodiments.
As shown in
The processing unit 64 comprises a processor which executes a program, and a memory for storing the program. That is, the processor executes the program stored in the memory so that the processing unit 64 achieves various kinds of processing. In the configuration example shown in
The face detecting unit 72 is a function for detecting the region of a moving object when the moving object (the face of a person) is contained in an input image. The face detection result storage unit 73 is a function for storing images including the moving object as a detected tracking target over past several frames. The tracking result managing unit 74 is a function for managing tracking results. The tracking result managing unit 74 stores and manages tracking results obtained in later-described processing. When detection is unsuccessful in a frame during the movement of the object, the tracking result managing unit 74 again adds a tracking result or causes the output unit to output a processing result.
The graph creating unit 75 is a function for creating a graph from face detection results stored in the face detection result storage unit 73 and from tracking result candidates stored in the tracking result managing unit 74. The branch weight calculating unit 76 is a function for allocating weights to branches of the graph created by the graph creating unit 75. The optimum path set calculating unit 77 is a function for calculating a combination of paths that optimizes an objective function from the graph. The tracking state judging unit 78 is a function for judging whether the tracking is interrupted or the tracking is ended because the object has disappeared from the screen when there is a frame in which the detection of the object (face) is unsuccessful among tracking targets stored and managed by the tracking result managing unit 74. The output unit 79 is a function for outputting information such as tracking results output from the tracking result managing unit 74.
Now, the configuration and operation of each unit are described in detail.
The image interface 62 is an interface for inputting images including the face of a person targeted for tracking. In the configuration example shown in
The face detecting unit 72 performs processing to detect one or more faces in the input image. The technique described in the first embodiment can be applied as a specific processing method. For example, a prepared template is moved in an image to find a correlation value so that a position providing the highest correlation value is set as a face region. Otherwise, a face extraction method that uses an eigenspace method or a subspace method can be applied to the face detecting unit 72.
The face detection result storage unit 73 stores and manages detection results of the face targeted for tracking. In the third embodiment, the image in each of the frames of the pictures captured by the camera 51 is used as an input image, and “face information” corresponding to the number of face detection results obtained by the face detecting unit 72, the frame number of the moving image, and the number of detected faces is managed. The “face information” includes information such as the detection position (coordinates) of the face in the input image, identification information (ID information) provided to the identical person that is tracked, and a partial image (face image) of a detected face region.
For example,
The tracking result managing unit 74 stores and manages tracking results or detection results. For example, the tracking result managing unit 74 manages information tracked or detected from the preceding frame (t−1) to the frame t−T−T′ (T>=0 and T′>=0 are parameters). In this case, information indicating a detection result targeted for tracking processing is stored up to the frame image of t−T, and information indicating past tracking results is stored from the frame t−T−1 to the frame t−T−T′. The tracking result managing unit 74 may otherwise manage face information for the image of each frame.
The graph creating unit 75 creates a graph comprising peaks corresponding to states “unsuccessful detection during tracking”, “disappearance”, and “appearance”, in addition to peaks (face detection positions) corresponding to data for the face detection results stored in the face detection result storage unit 73 and the tracking results (information on the selected tracking target) managed in the tracking result managing unit 74. The “appearance” referred to here means a condition in which a person who is not present in the image of the preceding frame newly appears in the image of the subsequent frame. The “disappearance” means a condition in which a person present in the image of the preceding frame is not present in the image of the subsequent frame. The “unsuccessful detection during tracking” means a condition in which the face that is to be present in the frame image is unsuccessfully detected. The “false positive” may be captured into consideration for the peak to be added. This means a condition in which an object that is not a face is erroneously detected as a face. The addition of the peak provides the advantage that tracking accuracy can be prevented from decreasing due to detection accuracy.
As shown in
The branch weight calculating unit 76 sets a weight, that is, a real value to a branch (path) set in the graph creating unit 75. This enables highly accurate tracking by considering both the probability of matching p(X) and the probability of mismatching q(X) between face detection results. In the example described in the present embodiment, a logarithm of the ratio between the matching probability p(X) and the mismatching probability q(X) is obtained to calculate a branch weight.
However, the branch weight has only to be calculated by considering the matching probability p(X) and the mismatching probability q(X). That is, the branch weight has only to be calculated as a value that indicates the relation between the matching probability p(X) and the mismatching probability q(X). For example, the branch weight may be a subtraction between the matching probability p(X) and the mismatching probability q(X). Alternatively, a function for calculating a branch weight may be created by using the matching probability p(X) and the mismatching probability q(X), and this predetermined function may be used to calculate a branch weight.
The matching probability p(X) and the mismatching probability q(X) can be obtained as feature amounts or random variables by using the distance between face detection results, the size ratio of face detection frames, a velocity vector, and a correlation value of a color histogram. A probability distribution is estimated by proper learning data. That is, the present person tracking system can prevent the confusion of tracking targets by considering both the probability of matching and the probability of mismatching between nodes.
For example,
In this case, the branch weight is calculated as the following value depending on the values of the probability p(X) and the probability q(X).
If p(X)>q(X)=0 (case A), log(p(X)/q(X))=+∞
If p(X)>q(X)>0 (case B), log(p(X)/q(X))=a(X)
If q(X)≧p(X)>0 (case C), log(p(X)/q(X))=−b(X)
If q(X)≧(X)=0 (case D), log(p(X)/q(X))=−∞
Nor that, a(X) and b(X) are nonnegative real values, respectively.
In the case A, as the mismatching probability q(X) is “0” and the matching probability p(X) is not “0”, the branch weight is +∞. The branch weight is positively infinite so that a branch is always selected in an optimization calculation.
In the case B, as the matching probability p(X) is greater than the mismatching probability q(X), the branch weight is a positive value. The branch weight is a positive value so that this branch is high in reliability and likely to be selected in an optimization calculation.
In the case C, as the matching probability p(X) is smaller than the mismatching probability q(X), the branch weight is a negative value. The branch weight is a negative value so that this branch is low in reliability and is not likely to be selected in an optimization calculation.
In the case D, as the matching probability p(X) is “0” and the mismatching probability q(X) is not “0”, the branch weight is −∞. The branch weight is positively infinite so that this branch is never selected in an optimization calculation.
The branch weight calculating unit 76 calculates a branch weight by logarithmic values of the probability of disappearance, the probability of appearance, and the probability of unsuccessful detection during tracking. These probabilities can be determined by previous learning using corresponding data (e.g., data stored in the server 53). Moreover, even when one of the matching probability p(X) and the mismatching probability q(X) is not accurately estimated, this issue can be addressed by providing the value of a given X with a constant value; for example, p(X)=constant value or q(X)=constant value.
The optimum path set calculating unit 77 calculates the total of the values of allocated branch weights calculated by the branch weight calculating unit 76 with regard to the combination of the paths in the graph created by the graph creating unit 75, and calculates (optimization calculation) a combination of the paths that maximizes the total of the branch weights. A well-known combinational optimization algorithm can be used for this optimization calculation.
For example, if the probability described in connection with the branch weight calculating unit 76 is used, the optimum path set calculating unit 77 can find a combination of the paths providing the maximum posterior probability by the optimization calculation. A face continuously tracked from a past frame, a face that has newly appeared, and a face that has not been matched are obtained by finding the optimum path combination. The optimum path set calculating unit 77 records the result of the optimization calculation in the tracking result managing unit 74.
The tracking state judging unit 78 judges a tracking state. For example, the tracking state judging unit 78 judges whether the tracking of the tracking target managed in the tracking result managing unit 74 has ended. When judging that the tracking has ended, the tracking state judging unit 78 informs the tracking result managing unit 74 of the end of the tracking so that a tracking result is output to the output unit 79 from the tracking result managing unit 74.
If there is a frame in which a face as a moving object is unsuccessfully detected among tracking targets, the tracking state judging unit 78 judges whether this is attributed to the interruption of the tracking (unsuccessful detection) during tracking or the end of the tracking caused by disappearance from the frame image (captured image). Information including the result of such a judgment is reported to the tracking result managing unit 74 from the tracking state judging unit 78.
The tracking state judging unit 78 outputs a tracking result from the tracking result managing unit 74 to the output unit 79 by the following standards: A tracking result is output frame by frame. A tracking result is output in case of inquiry from, for example, the server 53. Tracking information for matched frames is collectively output at the point where it is judged that there is no more person to be tracked in the screen. A tracking result is output by once judging that the tracking has ended when frames equal to or more than given frames are tracked.
The output unit 79 outputs information including the tracking results managed in the tracking result managing unit 74 to the server 53 which functions as a picture monitor device. Otherwise, the terminal device 52 may be provided with a user interface having a display unit and an operation unit so that the operator can monitor pictures and tracking results. In this case, the information including the tracking results managed in the tracking result managing unit 74 can be displayed on the user interface of the terminal device 52.
As the information managed in the tracking result managing unit 74, the output unit 79 outputs, to the server 53, face information, that is, the detection position of a face in an image, the frame number of moving images, ID information individually provided to the identical person that is tracked, and information (e.g., photography place) on an image in which a face is detected.
For the identical person (tracked person), the output unit 79 may output, for example, coordinates of a face in more than one frame, a size, a face image, a frame number, time, information on the summary of features, or information that matches the former information with images recorded in a digital video recorder (pictures stored in, for example, the image memory 63). Moreover, face region images to be output may only be all of the images being tracked or some of the images that are regarded as optimum under predetermined conditions (e.g., a face size, direction, whether eyes are open, whether an illumination condition is proper, or whether the likelihood of a face at the detection of a face is high).
As described above, according to the person tracking system of the third embodiment, the number of useless collations can be reduced and a load on the system can be lessened even when a great volume of face images detected from frame images of moving images input from, for example, monitor cameras are collated with the database. Moreover, even when the identical person makes complex movements, face detection results in frames can be reliably matched including unsuccessful detections, and a highly accurate tracking result can be obtained.
The person tracking system described above tracks a person (moving object) making complex behavior from images captured by a large number of cameras, and transmits information on a person tracking result to the server while reducing the load of a communication amount in the network. Thus, even if there is a frame in which a person targeted for tracking is unsuccessfully detected during the movement of this person, the person tracking system enables stable tracking of persons without discontinuing the tracking.
Furthermore, the person tracking system can record a tracking result in accordance with the reliability of the tracking of a person (moving object), or manage identification results of the tracked person. Thus, the person tracking system advantageously prevents the confusion of persons in tracking more than one person. Moreover, the person tracking system successively outputs tracking results targeted for past frame images dating back to an N frame from a current point, which means that on-line tracking can be performed.
According to the person tracking system described above, a picture can be recorded or a person (moving object) can be identified on the basis of an optimum tracking result when tracking is properly performed. Moreover, according to the person tracking system described above, when it is judged that a tracking result is complex and there may be more than one tracking result candidate, tracking result candidates are presented to the operator in accordance with the condition of a communication load or the reliability of the tracking result, or the tracking result candidates can be used to ensure the recording and displaying of pictures or the identification of a person.
Now, the fourth embodiment is described with reference to the drawings.
In the fourth embodiment, a moving object tracking system (person tracking system) for tracking a moving object (person) appearing in time-series images obtained from cameras is described. The person tracking system detects the face of a person from the time-series images obtained from the cameras, and when more than one face can be detected, the person tracking system tracks the faces of these persons. The person tracking system described in the fourth embodiment is also applicable to a moving object tracking system intended for other moving objects (e.g., a vehicle or an animal) by changing a moving object detecting method suitably to such a moving object.
Moreover, the moving object tracking system according to the fourth embodiment detects a moving object (e.g., a person, a vehicle or an animal) from a great volume of moving images collected from monitor cameras, and records the corresponding scenes in a recording device together with the tracking result. The moving object tracking system according to the fourth embodiment also functions as a monitor system for tracking a moving object (e.g., a person or a vehicle) photographed by monitor cameras, and collating the tracked moving object with dictionary data previously registered on a database to identify the moving object, and then reporting the identification result of the moving object.
The moving object tracking system according to the fourth embodiment described below targets, for tracking, persons (faces of persons) present in images captured by the monitor cameras in accordance with tracking processing to which a properly set tracking parameter is applied. Moreover, the moving object tracking system according to the fourth embodiment judges whether a person detection result is appropriate for the estimation of the tracking parameter. The moving object tracking system according to the fourth embodiment uses the person detection result judged to be appropriate for the estimation of the tracking parameter as information for learning the tracking parameter.
The person tracking system according to the fourth embodiment shown in
Each of the terminal devices 102 comprises a control unit 121, an image interface 122, an image memory 123, a processing unit 124, and a network interface 125. The control unit 121, the image interface 122, the image memory 123, and the network interface 125 can be similar in configuration to the control unit 21, the image interface 22, the image memory 23, and the network interface 25 shown in
Similarly to the processing unit 24, the processing unit 124 comprises a processor which operates in accordance with a program, and a memory in which the program to be executed by the processor is stored. As processing functions, the processing unit 124 comprises a face detecting unit 126 which detects a region of a moving object when the moving object (the face of a person) is included in an input image, and a scene selecting unit 127. The face detecting unit 126 has function for performing processing similar to that in the face detecting unit 26. That is, the face detecting unit 126 detects information (a region of a moving object) indicating the face of a person as a moving object from the input image. The scene selecting unit 127 selects a movement scene (hereinafter also referred to simply as a scene) of the moving object for use in the later-described estimation of the tracking parameter from the detection results by the face detecting unit 126. The scene selecting unit 127 will be described in detail later.
The server 103 comprises a control unit 131, a network interface 132, a tracking result managing unit 133, a parameter estimating unit 135, and a tracking unit 136. The control unit 131, the network interface 132, and the tracking result managing unit 133 can be similar to the control unit 31, the network interface 32, and the tracking result managing unit 33 shown in
The parameter estimating unit 135 and the tracking unit 136 each comprises a processor which operates in accordance with a program, and a memory in which the program to be executed by the processor is stored. That is, the processor executes the program stored in the memory so that the parameter estimating unit 135 achieves processing such as parameter setting processing. The processor executes the program stored in the memory so that the tracking unit 136 achieves processing such as tracking processing. The parameter estimating unit 135 and the tracking unit 136 may otherwise be obtained in such a manner that the processor executes the program in the control unit 131.
The parameter estimating unit 135 estimates a tracking parameter that indicates the standard for tracking the moving object (the face of a person) in accordance with the scene selected by the scene selecting unit 127 of the terminal device 2, and outputs the estimated tracking parameter to the tracking unit 136. The tracking unit 136 matches and tracks the identical moving objects (the faces of the persons) detected from the images by the face detecting unit 126, in accordance with the tracking parameter estimated by the parameter estimating unit 135.
The scene selecting unit 127 is described next.
The scene selecting unit 127 judges whether a detection result by the face detecting unit 126 is appropriate for the estimation of the tracking parameter. The scene selecting unit 127 performs two-stage processing including scene selecting processing and tracking result selecting processing.
First, the scene selecting processing determines a reliability as to whether a detection result row can be used for the estimation of the tracking parameter. The scene selecting processing judges a reliability on the basis of the face that the number of frames equal to or more than a predetermined threshold can be detected and the face that detection result rows of persons are not confused. For example, the scene selecting unit 127 calculates a reliability from the relation of the positions of the detection result rows. The scene selecting processing is described with reference to
D(a, c)<rS(c)
is satisfied wherein a is a detection result in a t frame, and c is a detection result in a t−1 frame. However, D(a, b) is the distance (pixel) between a and b in an image, S(c) is the size (pixel) of the detection result. r is a parameter.
Even when there is more than face detection result, a movement sequence of the identical person is obtained if faces are moving at distant positions in an image within a range smaller than the predetermined threshold. This is used for learning the tracking parameter. In order to classify, person by person, the detection result rows of the persons, a judgment is made by comparing the pair of detection results between frames as follows:
D(ai, aj)>C, D(ai, cj)>C, D(ai, ci)<rS(ci),
D(aj, cj)<rS(cj)
wherein ai and aj are detection results in the t frame, and ci, cj are detection results in the t−1 frame. However, D(a, b) is the distance (pixel) between a and b in an image, S(c) is the size (pixel) of the detection result. r and c are parameters.
The scene selecting unit 127 can also select a scene by using an image feature amount to perform a regression analysis of the condition in which persons are dense in an image. Further, the scene selecting unit 127 can perform person identifying processing using images of detected faces in frames only during learning and thereby obtain a movement sequence individually for the identical person.
In order to exclude erroneous detection results, the scene selecting unit 127 excludes a detection result in which the size for the detected position has a variation equal to or less than a predetermined threshold, excludes a detection result having a movement equal to or less than a predetermined movement, or excludes a detection result using character recognition information obtained by character recognition processing of the image of the surrounding. Thus, the scene selecting unit 127 can exclude erroneous detections attributed to posters or characters.
The scene selecting unit 127 provides, to data, the number of frames from which face detection results are obtained, and the reliability corresponding to the number of detected faces. The reliability is generally determined by information such as the number of frames from which faces are detected, the number of detected faces (the number of detections), the movement amount of the detected face, and the size of the detected face. The scene selecting unit 127 can calculate the reliability by, for example, the reliability calculation method described with reference to
The numerical values of the reliabilities can be determined on the basis of the number of frames in which tracking is successful, as shown in
The tracking result selecting processing is described next.
In the tracking result selecting processing, the scene selecting unit 127 judges whether each tracking result is likely to be a correct tracking result. For example, when tracking results shown in
As shown in
The parameter estimating unit 135 is described next.
The parameter estimating unit 135 estimates a tracking parameter by using the moving image row, the detection result row, and the tracking result that are obtained from the scene selecting unit 127. For example, suppose that the scene selecting unit 127 observes the obtained N data D={X1, . . . , XN} for a proper random variable X. For example, average of Dμ=(X1+X2+ . . . +XN)/N and dispersion ((X1−μ)2+ . . . +(XN−μ)2)/N are estimated values given that X follows a normal distribution when θ is the parameter of the probability distribution of X.
The parameter estimating unit 135 may directly calculate a distribution instead of estimating a tracking parameter. Specifically, the parameter estimating unit 135 calculates a posterior probability p(θ|D), and calculates a matching probability by p(X|D)=∫p(X|θ) p(θ|D)dθ. This posterior probability can be calculated by p(θ|D)=p(θ) p(D|θ)/p(D) if the prior probability p(θ) of θ and the likelihood p(X|θ) are determined as in the normal distribution.
As an amount used as the random variable, the amount of the movement of moving objects (faces of persons), a detection size, the similarities of various image feature amounts, and a moving direction may be made. The tracking parameter is an average or a variance-covariance matrix in the case of, for example, the normal distribution. However, various probability distributions may be used for the tracking parameter.
The tracking unit 136 is described next.
The tracking unit 136 performs optimum matching by integrating information such as the coordinates and size of the face of a person detected in the input images. The tracking unit 136 integrates tracking results in which the identical persons are matched in the frames, and outputs the integration result as a tracking result. When there is a complex movement such as crossing of persons in an image in which persons are walking, a single matching result may not be determined. In this case, the tracking unit 136 can not only output a result having the highest likelihood in the matching as a first candidate but also manage the proportionate matching results (i.e., output more than one tracking result).
The tracking unit 136 may output a tracking result through an optical flow or a particle filter which is a tracking technique for predicting the movement of a person. Such processing can be performed by a technique described in, for example, a document (Kei Takizawa, Mitsutake Hasebe, Hiroshi Sukegawa, Toshio Sato, Nobuyoshi Enomoto, Bunpei Irie, and Akio Okazaki: “Development of a Face Recognition System for Pedestrians, “Face Passenger”, 4th Forum on Information Technology (FIT2005), pp. 27-28).
As a specific tracking technique, the tracking unit 136 can be provided by a unit having processing functions similar to the tracking result managing unit 74, the graph creating unit 75, the branch weight calculating unit 76, the optimum path set calculating unit 77, and the tracking state judging unit 78 that are described in the third embodiment and shown in
In this case, the tracking unit 136 manages information tracked or detected from the preceding frame (t−1) to the frame t−T−T′ (T>=0 and T′>=0 are parameters). The detection results up to t−1 are detection results targeted for tracking processing. The detection results from t−T−1 to t−T−T′ are past tracking results. The tracking unit 136 manages face information (a position in an image included in a face detection result obtained from the face detecting unit 126, the frame number of moving images, ID information individually provided to the identical person that is tracked, and a partial image of a detected region) for each frame.
The tracking unit 136 creates a graph comprising peaks corresponding to states “unsuccessful detection during tracking”, “disappearance”, and “appearance”, in addition to peaks corresponding to face detection information and tracking target information. Here, the “appearance” means a condition in which a person who is not present in the screen newly appears in the screen. The “disappearance” means a condition in which a person present in the screen disappears from the screen. The “unsuccessful detection during tracking” means a condition in which the face that is to be present in the screen is unsuccessfully detected. The tracking result corresponds to a combination of paths on this graph.
A node corresponding to the unsuccessful detection during tracking is added. Thus, even when there is a frame in which detection is temporarily prevented during tracking, the tracking unit 136 correctly performs matching using the frames before and after the above frame and can thus continue tracking. A weight, that is, a real value is set for a branch set in the graph creation. This enables more accurate tracking by considering both the probability of matching and the probability of mismatching between face detection results.
The tracking unit 136 determines a logarithm of the ratio between the two probabilities (the matching probability and the mismatching probability). However, as long as the two probabilities are considered, the subtraction of the probability can be performed or a predetermined function f (P1 and P2) can be created. As feature amounts or random variables, the distance between face detection results, the size ratio of detection frames, a velocity vector, and a correlation value of a color histogram can be used. The tracking unit 136 estimates a probability distribution by proper learning data. That is, the tracking unit 136 advantageously prevents the confusion of tracking targets by considering the mismatching probability as well.
When the matching probability p(X) and the mismatching probability q(X) of face detection information u and v between frames are provided for the above-mentioned feature amounts, a probability ratio log(p(X)/q(X)) is used to determine a branch weight between the peak u and the peak v in the graph. In this case, the branch weight is calculated as follows:
If p(X)>q(X)=0 (case A), log(p(X)/q(X))=+∞
If p(X)>q(X)>0 (case B), log(p(X)/q(X))=a(X)
If q(X)≧p(X)>0 (case C), log(p(X)/q(X))=−b(X)
If q(X)≧p(X)=0 (case D), log(p(X)/q(X))=−∞
Not that a(X) and b(X) are nonnegative real values, respectively. In the case A, as the mismatching probability q(X) is 0 and the matching probability p(X) is not 0, the branch weight is +∞, and a branch is always selected in an optimization calculation. The same applies to the other cases (case B, case C and case D).
The tracking unit 136 determines a branch weight by logarithmic values of the probability of disappearance, the probability of appearance, and the probability of unsuccessful detection during tracking. These probabilities can be determined by previous learning using corresponding data. In the created graph including branch weights, the tracking unit 136 calculates a combination of the paths that maximizes the total of the branch weights. This can be easily found by a well-known combinational optimization algorithm. For example, if the probability described above is used, a combination of the paths providing the maximum posterior probability can be found. The tracking unit 136 can obtain a face continuously tracked from a past frame, a face that has newly appeared, and a face that has not been matched by finding the optimum path combination. Thus, the tracking unit 136 records the result of the processing described above in a storage unit 133a of the tracking result managing unit 133.
Now, the flow of the overall processing according to the fourth embodiment is described.
Time-series images captured by the cameras 101 are input to each of the terminal devices 102 by the image interface 122. In each of the terminal devices 102, the control unit 121 digitizes the time-series images input from the cameras 101 by the image interface, and supplies the digitized images to the face detecting unit 126 of the processing unit 124 (step S41). The face detecting unit 126 detects a face as a moving object targeted for tracking from the input frames of images (step S42).
When the face detecting unit 126 does not detect any face from the input images (step S43, NO), the control unit 121 does not use the input images for the estimation of the tracking parameter (step S44). In this case, no tracking processing is performed. When a face can be detected from the input images (step S43, YES), the scene selecting unit 127 calculates, from a detection result output by the face detecting unit 126, a reliability for judging whether the scene of the detection result can be used for the estimation of the tracking parameter (step S45).
After calculating the reliability of the detection result, the scene selecting unit 127 judges whether the calculated reliability of the detection result is higher than a predetermined reference value (threshold) (step S46). When judging that the calculated reliability of the detection result is lower than the reference value (step S46, NO), the scene selecting unit 127 does not use the detection result for the estimation of the tracking parameter (step S47). In this case, the tracking unit 136 performs the processing of tracking the person in the time-series input images by using the tracking parameter immediately before updated (step S58).
When judging that the calculated reliability of the detection result is higher than the reference value (step S46, YES), the scene selecting unit 127 retains (records) the detection result (scene), and calculates a tracking result based on this detection result (step S48). Moreover, the scene selecting unit 127 calculates a reliability of this tracking result, and judges whether the calculated reliability of the tracking result is higher than a predetermined reference value (threshold) (step S49).
When the reliability of the tracking result is lower than the reference value (step S49, YES), the scene selecting unit 127 does not use the detection result (scene) for the estimation of the tracking parameter (step S50). In this case, the tracking unit 136 performs the processing of tracking the person in the time-series input images by using the tracking parameter immediately before updated (step S58).
When judging that the reliability of the tracking result is higher than the reference value (step S49, YES), the scene selecting unit 127 outputs this detection result (scene) to the parameter estimating unit 135 as data for estimating a tracking parameter. The parameter estimating unit 135 judges whether the number of detection results (scenes) having high reliabilities is greater than a predetermined reference value (threshold) (step S51).
When the number of scenes having high reliabilities is smaller than the reference value (step S51, NO), the parameter estimating unit 13 does not estimate any tracking parameter (step S52). In this case, the tracking unit 136 performs the processing of tracking the person in the time-series input images by using the current tracking parameter (step S58).
When the number of scenes having high reliabilities is greater than the reference value (step S51, YES), the parameter estimating unit 135 estimates a tracking parameter on the basis of the scene provided from the scene selecting unit 127 (step S53). When the parameter estimating unit 135 estimates a tracking parameter, the tracking unit 136 performs the tracking processing in the scene retained in step S48 (step S54).
The tracking unit 136 performs the tracking processing by using both the tracking parameter estimated by the parameter estimating unit 135 and the retained tracking parameter immediately before updated. The tracking unit 136 compares the reliability of the result of tracking that uses the tracking parameter estimated by the parameter estimating unit 135 with the reliability of the result of tracking that uses the tracking parameter immediately before updated. When the reliability of the result of tracking that uses the tracking parameter estimated by the parameter estimating unit 135 is lower than the reliability of the tracking result that uses the tracking parameter immediately before updated (step S55), the tracking unit 136 only retains and does not use the tracking parameter estimated by the parameter estimating unit 135 (step S56). In this case, the tracking unit 136 performs the processing of tracking the person in the time-series input images by using the tracking parameter immediately before updated (step S58).
When the reliability based on the tracking parameter estimated by the parameter estimating unit 135 is higher than the reliability of the tracking parameter immediately before updated, the tracking unit 136 updates the tracking parameter immediately before updated, to the tracking parameter estimated by the parameter estimating unit 135 (step S57). In this case, the tracking unit 136 tracks the person (moving object) in the time-series input images in accordance with the updated tracking parameter (step S58).
As described above, the moving object tracking system according to the fourth embodiment finds a reliability of a tracking result of a moving object, and when the found reliability is high, the moving object tracking system estimates (learns) a tracking parameter and adjusts the tracking parameter for use in the tracking processing. According to the moving object tracking system of the fourth embodiment, when more than one moving object is tracked, the tracking parameter is adjusted for a variation originating from the change of photographing equipment or a variation originating from the change of photographing environments, so that the operator can save the trouble of teaching a right solution.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2010-035207 | Feb 2010 | JP | national |
2010-204830 | Sep 2010 | JP | national |
This application is a Continuation Application of PCT Application No. PCT/JP2011/053379, filed Feb. 17, 2011 and based upon and claiming the benefit of priority from prior Japanese Patent Applications No. 2010-035207, filed Feb. 19, 2010; and No. 2010-204830, filed Sep. 13, 2010, the entire contents of all of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2011/053379 | Feb 2011 | US |
Child | 13588229 | US |