The present invention relates to object tracking processing.
Conventionally, there is a technique for tracking a path of movement of a person in a video image captured by a surveillance camera. When a person in a video image is collated with the information for a person who is present in a database (hereinafter, referred to as “person information”) while chasing the person in the video image in real time, collation processing needs to be completed in a short period of time so that the tracking processing can be performed without delay on each frame. However, as the number of persons in the video image and the number of items of person information increase, the calculation amount of collation processing increase.
A conventional method for limiting the number of persons to be collated in a video image so as to reduce the calculation amounts when many persons in the video image are present has been disclosed. Japanese Patent application Laid-Open No. 2008-40781 discloses a method for limiting the persons to be collated based on the number of collations in the past and time-series information regarding the person who is being tracked in the video image. According to this method, for example, the correctness of personal information for an object for which a long time has passed since the most recent collation can be confirmed preferentially. Japanese Patent application Laid-Open No. 2020-9383 discloses a method for preferentially selecting a person whose location significantly changes from the previous frame, that is, a person is highly possible to frame out soon, and treating the selected person as a collation target. According to this method, even when there are many persons in the video image, the calculation amount can be reduced while preventing omission of collation processing for the persons appearing in the video image.
The object of an information processing apparatus according to the present disclosure is reducing a calculation amount related to collation of a person in a video image.
An information processing apparatus, as one aspect of the present invention determines tracking information that is a target of collation processing for performing association with person information, from among a plurality of items of tracking information corresponding to a plurality of persons detected from a video image frame, based on a feature amount of each of the plurality of persons, and executes collation processing in which person information to be associated with the tracking information that is a target of the collation processing is identified based on a similarity between a feature amount of a person corresponding to a tracking information determined to be a target of the collation processing and a plurality of feature amounts stored in association with a plurality of items of person information in a storage device.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
A detailed description of a mode for carrying out the present invention will be explained below with reference to the attached drawings. The embodiment described below is an example of a means of executing the present invention and should be modified or changed as appropriate depending on the configuration of the apparatus to which the present invention is applied and various conditions, and the present invention is not limited to the embodiments below. Additionally, the configuration may be made by appropriately combing a part of each of the embodiments to be described below.
An information processing apparatus 100 according to the present embodiment also functions as a person tracking apparatus that analyzes video images in which a person shot by a network camera and the like stands out and acquires a path of movement of the person. In the present embodiment, an example of acquiring the path of movement of the same person from a single camera video image will be explained.
First, processing of following the path of movement of a person treated in the present embodiment will be explained below. In the present embodiment, the process of following the path of a short-time movement of a person is referred to as “tracking”. Hereinafter, information on the path of movement obtained by tracking processing is referred as “tracking information”. Tracking information is information in which a person is detected for each frame in a video image and items of information for a person detection range in each frame are connected in time series. Hereinafter, the information regarding the detection regions in which the items of tracking information are connected is referred to as a “feature amount”. The feature amount is the coordinates of the detection region and image features.
Additionally, hereinafter, the processing of associating tracking information on the same person is referred to as “person collation”. In the present embodiment, the association is performed by associating a plurality of items of tracking information considered to be the path of movement of the same person by using an ID of a person that is present in the database. Additionally, hereinafter, the ID that associates a plurality of items of tracking information is referred to as “person information”. When a person in the video image reappears after disappearing from a screen for a long time during the tracking processing and then appears again, the tracking information is interrupted. However, when a person collation is performed, the moving path of the tracking information, which has been interrupted before and after a person disappears, is associated as the path of movement of the same person.
The tracking information includes the tracking information in which the person information has not been determined and the tracking information in which the person information has been determined (first tracking information). Note that from among the items of the tracking information, the tracking information in which the person information has not been determined will be referred to below as “tracking information query (second tracking information)” so as to distinguish from the tracking information in which the person information has been determined. Additionally, person information includes person information (first person information) in which associated tracking information is present in a frame being processed and person information in which associated tracking information is not present in the frame being processed. Note that, hereinafter, the person information in which the associated tracking information is not present in the frame being processed, from among the items of the person information, will be referred to as “person information in which collation has not been established (second person information)”.
In addition, in the present embodiment, an explanation will be given of a method for determining, as a collation object, a combination of tracking information and person information in which the probability of determining the person information corresponding to a tracking information query is high, from among tracking information query and person information. In the following, there are cases in which the combination of the collation target of the tracking information and the person information is referred to as a “combination of collation target”.
In the present embodiment, an index referred to as a degree of confirmation, which indicates the probability that the person information corresponding to the tracking information query is be determined, is calculated using the tracking information, and a combination of collation targets is determined based on the degree of confirmation. Additionally, in the present embodiment, when there is a plurality of tracking information queries, the feature amount in the most recent frame and the feature amount in the past frame held in tracking information queries are compared, and the tracking information query in which the feature amounts change significantly is included in the combination of collation target.
The past frame is a frame that has been processed earlier than the frame being processed, and, in the present embodiment, the past frame is one frame earlier than the frame that has been acquired most recently. By treating only the tracking information query in which the feature amounts change significantly as a collation target, the tracking information query corresponding to a person whose appearance does not change significantly with respect to the past frame is determined not to be able to confirm the person information even if the person collation is performed again, and the processing of the person collation is omitted.
In the procedure, first, a person is detected for each frame of the video image, and tracking information obtained by tracking the person over a plurality of frames is generated. Next, with the object of performing person collation, the degree of confirmation is determined for each query of the tracking information based on the tracking information. Specifically, the degree of confirmation is determined for each tracking information query based on a difference in the feature amount (change amount) in the tracking information query. Then, a combination of the tracking information and person information with a high degree of confirmation is determined to be a collation target, from among the items of the person information that have not been collated with the tracking information query based on the degree of confirmation. Finally, a person collation is performed with the combinations of collation target as an object.
Next, a configuration of the information processing apparatus 100 in the present embodiment will be explained with reference to
As shown in
The central processing unit (CPU) 101 is a central calculation unit configured by at least one computer, performs calculations, logical decisions, and the like for various processes, and controls each configuration component connected to a system bus 108. The Read-Only Memory (ROM) 102 is a program memory and stores programs for control performed by the CPU 101 including various processing procedures to be described below. The Random Access Memory (RAM) 103 is used as a main memory of the CPU 101 and temporary storage regions such as a work area and the like. Furthermore, the program memory may be realized by loading a program into the RAM 103 from an external storage device and the like connected to the information processing apparatus 100.
The HDD 104 is a hard disk for storing electronic data and programs according to the present embodiment. An external storage device may be used as a device that plays a similar role. Here, the external storage device can be realized by, for example, a medium (recording medium) and an external storage drive for realizing access to the medium. For example, flexible discs (FD), CD-ROMs, DVDs, USB memories, MOs, flash memories, and the like are known as such media. Additionally, the external storage device may be a server device and the like connected by a network.
The display unit 105 is, for example, a CRT display, a liquid crystal display, and the others, and is a device that outputs images to a display screen. Note that the display unit 105 may be configured by an external device connected to the information processing apparatus 100 by wire or wireless. The operation unit 106 has a keyboard and a mouse and receives various types of operations by users. The communication unit 107 performs wired or wireless two-way communication with other information processing devices, communication devices, external storage devices, and the others, by using well-known communication technology.
As shown in
The image acquisition unit 210 acquires an video image or a series of images to be processed from an external device in the order of time series. In addition, the image acquisition unit 210 also acquires a frame that has been cut from the acquired video image. Note that the image acquisition unit 210 also functions as an acquisition means for carrying out each of the processes described above. Details of the processing that the image acquisition unit (acquisition means) 210 performs will be described below.
The detection unit 220 acquires one frame in the video image that is a processing target acquired by the image acquisition unit 210, and detects a person from the acquired frame. Additionally, the detection unit 220 transmits information on regions (detection regions) surrounding all detected persons to the tracking unit 230. Details of the processing that the detection unit (detection means) 220 performs will be described below. The tracking unit (tracking means) 230 performs processing (tracking processing) for tracking a person in the video image based on the information acquired from the detection unit 220. Details of the processing that the tracking unit 230 performs will be described below.
The person collation unit 240 acquires a tracking information query and person information in which collation has not been established from the combination determination unit 270, and performs a person collation based on these items of the information. The display control unit 250 causes the display unit 105 to display at least one of the tracking information and the person for each frame in the video image displayed on the screen. Details of the processing that the person collation unit (collation means) 240 performs will be described below.
The storage unit 260 stores a plurality of items of tracking information and person information. Additionally, the storage unit manages databases related to tracking information, person information, and the like as shown in
The combination determination unit 270 calculates a degree of confirmation based on a tracking information query and person information in which collation has not been established, and based on the tracking information, determines a combination with a high degree of confirmation based on the degree of confirmation. Details of the processing that the combination determination unit (decision means) 270 performs will be described below.
Hereinafter, the contents of processing of each functional unit (each means) described above in the present embodiment will be explained in detail with reference to
First, in S301, an image acquisition unit 210 acquires video images or a series of images to be processed from an external device in the order of time-series. Although the external device that acquires the images is, for example, an imaging device including a camera and the like, it is not limited to the camera and may be a device including a server or a device stored in a storage medium, for example, an external memory. Additionally, the external device may have a built-in camera or acquire images from a remote camera via a network, for example, an IP network.
In the present embodiment, the image acquisition unit 210 acquires a frame (frame image) cut from the video image shown in
Since the person A is facing forward since time t1 and his/her face is clearly visible, there is a high probability that the corresponding person information can be determined by a person collation. That is, the person A is in a state in which the corresponding person information can be easily determined by a person collation. In contrast, since the person B and the person C are both turn backward from time tm until just before time tn, they are in a state in which the person information is difficult to be determined. Therefore, there is a low probability that the personal information on the persons B and the person C can be determined at time tm. However, at time tn, since the person B turns sideways, a face that is unique information for each person is reflected in the frame. Accordingly, there is a high probability that the person information on the person B can be determined. In the present embodiment, using such a video image as an example, the person collation is performed only on the person whose appearance in the detection region has significantly changed from the previous frame, from among the persons corresponding to the tracking information query in which the person information has not been determined. In contrast, regarding a person whose appearance in the detection region changes little from the previous frame, it is determined that the probability that the person information is determined is low even if the person collation is performed again, and the person collation is omitted.
Referring back to
Next, in S303, the detection unit 220 performs processing for detecting a person from the frame. Specifically, the detection unit 220 acquires frames from video images that have been acquired from the image acquisition unit 210, performs detection processing on each frame, and acquires position information on the person within the frame. In the present embodiment, detection is performed by a model for which learning for detecting a person from an image has been previously performed. For example, learning of many learning data configured from a pair of an image of a person and correct images showing the position of the person in the image is performed, a frame is input to the model, and consequently, output of information on the position of the person within the frame becomes possible. Any machine learning algorithm may be used for learning of the model of the present embodiment, for example, algorithms such as a neural network can be used.
In the present embodiment, it is assumed that information on the detection region for each person within the frame is acquired to be used as the information on the position obtained by the detection unit 220. The information on the detection region here is the coordinates of the upper left edge and the size of the detection region in the frame. The rectangular-shaped detection region 411 of the image 410 in
Next, in S304, the tracking unit 230 performs processing for tracking the person in the video image (tracking processing). Specifically, in the tracking processing in the tracking unit 230, generation or update of the tracking information is performed by assigning information on the detected region for each person in each frame to different items of tracking information for each person. The tracking information is information in which the feature amounts related to the detection region for each frame of the person being tracked are connected by the number of frames in which the person has been detected, as described above.
In the present embodiment, the tracking unit 230 receives information on each detection region from the detection unit 220, and assigns the information on each detection region of the most recent frame at that point in time to the tracking information of the previous frames as a feature amount. These items of information are used for assignment processing that the tracking unit 230 performs will be described below. When there is a high probability that the person in the detection region within the frame is a person who is present in the proceeding frame (past frame), the tracking unit 230 adds a feature amount related to the detection region of the most recent frame to the tracking information on that same person. Additionally, when it is recognized that a new person has appeared, the tracking unit 230 generates new tracking information.
The assignment processing performed by the tracking unit 230 is performed by comparing the feature amount in the detection region of the person in the most recent frame and the feature amount stored in the tracking information generated for the previous frames has. Specifically, the similarity of the feature amounts in the detection region of each person in the previous frame is checked against the feature amounts in the detection region of each person in the most recent frame. In addition, the feature amount of the most recent frame is added to the tracking information having the highest similarity to the previous state. For example, when the position information (coordinates) of a detection region is used as a feature amount, the central position of the detection region can be used. Additionally, when image features are used, feature amounts that are acquired by applying color information, texture information, convolutional neural networks (CNN), and the like in the detection region of the frame to the detection region can be used. In the present embodiment, the tracking information generated by the tracking unit 230 is transmitted to the storage unit 260, and the transmitted tracking information is managed by the storage unit 260.
Additionally, the storage unit 260 manages the time of the frame in which a person has been detected, the coordinates and size of the detection region, image features, and the like, as a set for each of the tracking IDs, as shown in
The tracking information in which the tracking IDs are track 7, track 8, and track 9 is assumed to be the path of movement of the person A, the person B and the person C in the video image as shown in
To simplify the description, the tracking information of track 7 in the tracking ID is referred to as “tracking information track 7”, and the tracking information query of track 8 and track 9 in the tracking ID is referred to as “tracking information query track 8 and the tracking information query track 9”. Additionally, information indicating whether or not the person information corresponding to the tracking information has been determined is set in the column of “Person information determined” in the database as shown in
Next, in S305, based on the tracking information, the combination determination unit 270 determines a combination with a high degree of confirmation based on the tracking information query and person information in which collation has not been established. The detailed processing content of the combination determination unit 270 will be explained below using the flow chart in
First, in S3051, the combination determination unit 270 obtains lists of tracking information queries. That is, the combination determination unit 270 obtains lists only of the number of tracking information queries. In this case, the combination determination unit 270 acquires information on the tracking information query track 8 and the tracking information query track 9 from the storage unit 260, as an example of the list.
Next, in S3052, one tracking information query is selected from the lists acquired in S3051. In this case, the tracking information query track8 is selected as an example. Hereinafter, a tracking information query selected in S3052 will be referred to as “tracking information query that is the focus of attention”.
Next, in S3053, the combination determination unit 270 calculates the degree of confirmation based on the tracking information. In the present embodiment, the combination determination unit 270 determines the degree of confirmation based on a comparison between feature amounts in the tracking information query. Specifically, the degree of confirmation is determined based on a distance in data space between the feature amount in the most recent frame and the feature amount in the past frame(s). Here, the degree of confirmation is a value of the difference (amount of change) between the feature amount in the most recent frame and the feature amount in one previous frame in the tracking information query.
For example, it is assumed that the difference in the feature amount of the tracking information query track8 is 20,000. In this case, the combination determination unit 270 determines (calculates) 20,000 that is the difference in the feature amount of the tracking information query track8, as the degree of confirmation of the tracking information query track8.
Next, in S3054, it is determined whether or not the degree of confirmation has been calculated with respect to all the acquired tracking information queries, in S3051. When the degree of confirmation is not calculated based on all the acquired tracking information queries, the process returns to S3051, and processes as in the above description are repeated. In contrast, when the degree of confirmation is calculated based on all the acquired tracking information queries, the process proceeds to S3055.
In the case of “NO” in S3054, when the process returns to S3051, the combination determination unit 270 selects the tracking information query track9. Subsequently, the combination determination unit 270 calculates the degree of confirmation based on the tracking information. At this time, when the difference in the feature amount of the tracking information query track9 is 700, 700, which is the difference in this feature amount, is determined (calculated) to be the degree of confirmation of the tracking information query track9.
The reason why the degree of confirmation in the tracking information query track8 is greater than that in the tracking information query track9 is that, as described above, at time tn−1, the person B and the person C turn backward, but at time tn, only the person B turns sideways. That is, since the behavior and the like of the person B significantly changes from time tn−1 to time tn, the amount of change in the tracking information query track8 that corresponds to the person B is greater. Note that, in calculating the degree of confirmation, a distance, for example, Euclidean distance between the feature amounts may be used, in addition to the difference between the feature amounts. Even if the distance, for example, Euclidean distance between the feature amounts is used, similar processing can be achieved.
Next, in S3055, based on the calculated degree of confirmation, the combination determination unit 270 performs processing for determining the combination of collation target based on the tracking information query and the person information stored in the storage unit 260 and for which collation has not been established. Specifically, tracking information queries with the degree of confirmation higher than a threshold are included in the combination of collation target.
For example, the degree of confirmation of the tracking information query track8 is 20,000 when a threshold is set to 10,000, and accordingly, the degree of confirmation of the tracking information query track8 is the threshold value or above. Hence, the combination determination unit 270 includes the tracking information query track8 in the collation target. In contrast, since the degree of confirmation of the tracking information query track9 is 700, the degree of confirmation of the tracking information query track9 is less than the threshold. Therefore, since the difference in the feature amount from the previous frame is small, the combination determination unit 270 determines that the result will not change even if the person collation is performed again, and does not include the tracking information query track9 in the collation target. Person information to be included in the combination of collation target is determined by referring to the database related to person information in the storage unit 260. Note that the threshold is set in advance, for example, before the start of the processing as shown in
In the present embodiment, the tracking information query track8 that is equal to or above the threshold and all the person information in which collation has not been established is included in the collation target. Additionally, the tracking information queries that are less than the above thresholds are not included in the above collation target. In the present embodiment, it is possible to omit the processing of person collation of the tracking information query and the person information in which the possibility of determining the person is low, by limiting the combination of collation target as described above.
Here, a database related to person information that the storage unit 260 stores at the at time tn will be explained with reference to
Hereinafter, the case in which there is tracking information in which the correspondence to the person information has been determined prior to the most recent frame will be described as “tracking information and person information are associated”. For example, the person information person1 as shown in
Additionally, in the present embodiment, person information has information indicating whether or not associated tracking information is present in the most recent frame. In the example of
Referring back to
For example, in the present embodiment, the degree of similarity of feature amount is calculated with respect to all the combinations of all feature amounts of the tracking information queries and all the feature amounts of the tracking information associated with person information. Then, when there is person information in which the average of the degree of similarity exceeds a predetermined threshold (predetermined value), person information corresponding to the tracking information query is determined. Formula 1 below shows an example of the formula for calculating the average of degree of similarity for person collation of one tracking information query and one item of person information.
However, it is assumed that similaritymean shown in the above Formula 1 is the average of the degree of similarity. The denominator is the total number of combinations of the feature amount that the tracking information has and the feature amount that the tracking information associated with all person information of collation target has. The numerator is the sum of the results obtained by calculating the degree of similarity in each combination thereof.
The meaning of each variable will be explained below. First, it is assumed that “tn” denotes the number of items of tracking information associated with person information, and “ti” denotes a variable that varies from 1 to th for each item of the tracking information associated with the person information. “fn (ti)” denotes an array of the number of feature amounts that each item of tracking information associated with the person information has, and has different values for each item of the tracking information. It is assumed that “fi” denotes a variable that changes from 1 to fn (ti) for each feature amount that the tracking information associated with the person information has. It is assumed that “qfn” denotes the number of feature amounts of the tracking information query, and “qfi” denotes a variable that changes from 1 to qfn for each feature amount of the tracking information query. It is assumed that “f” denotes a vector of feature amount. It is assumed that “F” denotes a function for calculating the degree of similarity between feature amounts. An example of the formula for the function F will be shown in Formula 2 below.
However, it is assumed that f denotes a vector of one feature amount, and f1 and f2 are different vectors having the length of nk. It is assumed that “k” denotes a variable that varies from 1 to nk for each feature amount.
When person collation is performed, the degree of similarity is calculated by the number of times corresponding to the number of combinations of the feature amount that the tracking information has and the feature amount that the tracking information associated with each item of person information has, as shown in the above Formula 1. Hence, when the number of times of person collation increases, the amount of calculation becomes more demanding. Therefore, the object of the present embodiment is reducing the calculation amount by limiting tracking information queries of the collated target based on the degree of confirmation.
In the present embodiment, as described above, the tracking information query track8 is a collation target to the person information, and the tracking information query track9 is not the collation target to the person information. Accordingly, the person collation processing in which the combination of the tracking information query track8 and person information of person2, person3, and person4 are targeted will be explained below.
As shown in
In contrast, with regard to the query track9 of the tracking information that has not been selected to be the collation target (that has not been included in the collation target), the element in the column of “Person information determined” in the database of the tracking information in
Next, in S307, the display control unit 250 updates the display screen of the display unit 105. The display control unit 250 outputs at least one item of the tracking information or the person information for each frame in the video image. Specifically, the display control unit 250 causes a frame of the detection area as shown in
Furthermore, the display control unit 250 may add and output numerical values of information related to the person being displayed, such as a tracking ID, or a person ID, to the vicinity of each frame. In addition, the frame may be displayed in different colors according to differences in the tracking ID and the person ID. Note that the display control unit 250 may cause what is displayed on the display unit 105 to be only the frame of the detection region on the screen, or may cause the tracking ID and the person ID to be displayed without displaying the frame. Furthermore, the colors of the frames to be displayed may be unified. Furthermore, the thickness of the frame may be changed according to differences in the tracking ID and the person ID.
Next, in S308, the CPU 101 determines whether or not a series of processes has been completed with respect to all the frames in the video image. As the result of determination, when a series of processes has been completed for all the frames in the video image, processing ends. In contrast, when a series of processes has not been completed for all the frames in the video image, the process returns to S302, and processes from S302 to S308 are repeated.
According to the method as described above, in the processing of tracing the path of movement of persons in the video image, it is possible to determine a combination in which a degree of confirmation is high from among of items of tracking information and person information, and perform a person collation. According to the present embodiment, since the person collation can be omitted for the tracking information in which the possibility of determining the person information is low, the number of times of person collation can be reduced, and the calculation amount can be reduced.
If the method and processing in the present embodiment are not performed, a calculation amount corresponding to the total number of the combinations of the tracking information query and the person information in which collation has not been established occurs. However, according to the information processing apparatus 100 in the present embodiment, it is possible to suppress high-cost calculation due to a person collation. In the present embodiment, due to the amount of calculation by the processing of the combination determination section 270 increasing, the benefit of the reduction in the amount of calculation due to the reduction in the number of person collations exceeds the increase.
In the present embodiment, all items of the person information in which collation has not been established are used for person collation. As describe above, an explanation regarding the database of person information that the storage unit 260 stores at the point in time of time tn has been given. Specifically, in the present embodiment, the combination determination unit 270 sets the combinations of the tracking information query track8 and all the person information of person2, person3, and person4 in which collation has not been established, to a collation target.
According to the above method, the combination determination unit 270 can determine the combination of tracking information and person information in which the probability of determining the person information for the tracking information query is high, from among the tracking information queries and person information in which collation that has not been established.
Note that, the processing order of the information processing apparatus 100 in the present embodiment is not limited to the order of processing as shown in
Note that although it has been explained above that the storage unit 260 stores the tracking information and the person information, old information may be deleted. For example, when the number of feature amounts that the tracking information has exceeds a predetermined number, the old feature amounts are deleted so that the number of feature amounts does not exceed a predetermined number. An increase in memory load can be avoided by deleting old data.
Note that although it has been described above that the past frame used when the combination determination unit 270 calculates the degree of confirmation is one previous frame, the past frame may be a frame several framers earlier. For example, when a video image captured at a high frame rate is to be processed, since there is no significant changes in the video image for each frame, a difference in the image feature between a person whose appearance changes and a person whose appearance does not change becomes more clear by using a frame several frames earlier. Note that a user or others can arbitrarily set which frame is to be used for the past frame in the calculation of the degree of confirmation.
Note that although it has been described above that the combination determiner 270 determines the degree of confirmation by the method based on a distance in the data space, there are other methods based on the degree of similarity between the feature amount in the tracking information query. Specifically, the degree of similarity between the feature amount in the most recent frame of the tracking information query and the feature amount in the past frame is calculated, and the degree of confirmation is determined based on the degree of similarity.
For example, the past frame in this case is made one previous frame. In the example of
[Formula 3]
similarityself=F(fself_latest, fself_latest−1) (3)
However, it is assumed that similarity self in the above Formula 3 is the degree of similarity between feature amounts in the tracking information queries. Function F is the same function as the formula shown in the above Formula 2. It is assumed that fself_latest latest denotes the most recent feature amount of the tracking information query being paid attention to and fself_latest−1 denotes the feature amount that is one older than the most recent feature amount. It is assumed that the degree of confirmation is, for example, a value obtained by subtracting the degree of similarity of similarityself from 1. By calculating the degree of confirmation by using this method, it is possible to omit collation processing for the tracking information query in which there is no significant change in the appearance on the image after the previous time, as in the present embodiment.
Additionally, the number of past frames used in the method based on the degree of similarity among feature amounts in the tracking information query is not limited to one, and may be two or more. Specifically, all frames of the tracking information query, a predetermined number of frames, or frames selected every few frames can be used as past frames.
For example, the degree of similarity between the feature amounts to the frame being processed is calculated for each of the past frames by the above Formula 2. Then, the maximum value is acquired from among the plurality of calculated degrees of similarity, and the value obtained by subtracting the maximum value of the degree of similarity from 1 is used a the degree of confirmation. According to this method, the person collation can be omitted when at least one of the feature quantities of the tracking information in which the person information has not been determined in the past frame is similar.
Note that in the above description, although it has been described that the combination determination unit 270 calculates the degree of confirmation based on the comparison between the feature amounts in the tracking information queries, the degree of confirmation can also be determined based on the comparison of the feature amounts between the tracking information queries. Specifically, the degree of confirmation is determined based on the degree of similarity between the query of tracking information being paid attention to and the query of other tracking information. When the degree of similarity is low, it is interpreted that there is a probability that feature amounts unique to each person, such as a face, are clearly reflected in the frame, and the degree of confirmation of query of the tracking information being paid attention to is made increase.
For example, when persons wearing similar clothing turn backward, the similarity of image features between persons becomes high, however, when at least one person turns sideways, the similarity of the appearance on the image relative to the other persons becomes low. The method for determining the degree of confirmation in this case will be explained with reference to
For example, in determining the degree of confirmation by using 621 as a tracking information query, the degree of similarity between a most recent feature amount 624 of the tracking information query 621, a most recent feature amount 625 of the tracking information query 622, and a most recent feature amount 626 of the tracking information query 623 are respectively calculated. Subsequently, the average of the degree of similarity in each feature amount is calculated as in Formula 4 below.
However, it is assumed that similarityother in the above Formula 4 is the average of the degree of similarity of feature amounts between tracking information queries. “n” is the number of queries for all the tracking information. It is assumed that fself denotes the most recent feature amount of the tracking information query being paid attention to, “fi “denotes the most recent feature amount of the other tracking information query, and “i” denotes a subscript that is different for each of tracking information queries. For example, a value obtained by “subtracting similarityother” of the similarity average from 1 is determined to be the degree of confirmation, and when the degree of confirmation is higher than a predetermined threshold, tracking information being paid attention to is included in the combination. Thus, a tracking information query being paid attention to and the other tracking information are compared, and as a result, it is possible to perform a person collation at a timing when the similarity of image features between persons is low. According to this method, there is an advantage of suppressing false collations in addition to significantly enabling the reduction of the amount of calculation in a person collation.
Note that the combination determination unit 270 can also determine the degree of confirmation based on both the comparison between the feature amounts in the tracking information query explained above and the comparison between the feature amounts between the tracking information queries. Specifically, when the similarity between the feature amounts in the tracking information query is high and the similarity between the feature amount in the tracking information query and the feature amount in the other tracking information query is low, the combination can be determined so as to include this tracking information.
An example of calculating the degree of confirmation will be explained with reference to
Thus, according to the method based on both the comparison between the feature amounts in the tracking information queries and the comparison of feature amounts between the tracking information queries, the calculation amount increases slightly compared to the method based only on the latter one, however, false collations can be suppressed while significantly reducing the calculation amount in a person collation.
Note that when a part of the range of the detection region of the most recent frame of tracking information is hidden, the combination determination unit 270 may forcibly lower the degree of confirmation. Specifically, a tracking information query in which the detection region in the most recent frame overlaps between a plurality of persons lowers the degree of confirmation. This is because when a plurality of persons enter the image range of the detection region, the image features of a plurality of persons are included in a single feature amount, and because the probability of confirmation is low even when a person collation is performed. For example, the presence or absence of overlap is determined by comparing the magnitude relation between the xy coordinates at one end point of a rectangle and the xy coordinates at another end point of the other rectangle with respect to two detection regions. According to this method, the amount of calculation performed by the combination determination unit 270 can further be reduced, more than the method based on the degree of similarity described above.
Note that although it has been described above that the person collation unit 240 performs a person collation based on image features in the detection region, the method of person collation is not limited to this method. A method using face recognition may be used. In this method, the person collation unit 240 performs face detection within the image range of a person associated with the tracking information and applies a model of face authentication that is prepared in advance to the image range of the detected face.
The face authentication model is a model in which a different ID for each person is output when an image including a face is input. The face authentication model is applied to the detection region of a face of each tracking information, and when the result of face authentication acquired from the tracking information associated with the person information matches the result of face authentication result acquired from the query of the tracking information, person information corresponding to the corresponding tracking information is determined. Since the method using face authentication is based on information that is certainly different for each person, the accuracy of the result of person collation is higher, compared to the other methods.
Note that although it has been described above that the display control unit 250 outputs information related to the tracking information or the person information for each frame, the processing of the display control unit 250 is not limited to this. For the person corresponding to the tracking information in which the person information has not been determined for a certain period of time, the reason why the person information has not been able to be determined may be displayed (text information is displayed). For example, the information of “person collation is not performed because there is no significant movement” is displayed around this person. According to this method, a user can know the reason why the person information regarding the person in the video image is not easily determined.
Although it has been explained, in the above described embodiment, that the combination determination unit 270 determines the combination by limiting the tracking information query based on the degree of confirmation, in a new embodiment to be described below that is different from the above, the person information to be collated is also further limited.
The narrowing unit 810 performs processing for narrowing down candidates of person information corresponding to the tracking information query from among items of the person information in which collation has not been established. In the present embodiment, the narrowing unit 810 narrows down the candidates of the person information corresponding to the query of the tracking information based on the comparison between the feature amount of the query of the tracking information and the feature amount of the tracking information associated with the person information, and further performs association between the query of the tracking information and the candidates. Hereinafter, the information in which the tracking information query and the candidate of person information have been associated will be referred to as “association information”. The narrowing unit 810 transfers the association information to the combination determination unit 270. The combination determination unit 270 determines a tracking information query of the collation target based on the degree of confirmation, and further determines the person information to be collated based on the association information.
The flow of process in the present embodiment will be explained with reference to
Through the explanation of the processing for the frame at time tm as shown in
First, the processing at time tm will be explained. In the present embodiment, as in the above disclosure, it is assumed that processing is performed on the other images before processing is performed on the video images as shown in
First, in S301, the image acquisition unit 210 acquires the video image as shown in
Next, in S305, the combination determination unit 270 determines the combination of collation target. The process in S305 in the present embodiment is carried out through the process flow in
Next, in S306, the person collation unit 240 performs a person collation. Here, the degree of similarity between the feature amounts of the tracking information query and tracking information associated with the person information is calculated, and when there is a combination exceeding a predetermined threshold, the person information corresponding to the tracking information query is determined. However, in the present embodiment, unlike the embodiment explained above, when there is a plurality of items of person information in which the degree of similarity is high in comparison to a tracking information query, processing that does not determine the person information is performed. For example, regarding a tracking information query, the presence or absence of person information in which the degree of similarity exceeds a predetermined threshold is confirmed. When the corresponding person information is present and the number of items of the person information is one, the person information corresponding to the tracking information query is determined. If two or more items of the corresponding person information are present, the person information is not determined.
Next, in S901, the narrowing unit 810 determines the candidate of human information corresponding to a tracking information query in which the person information has not been determined. In the present embodiment, the narrowing unit 810 determines the candidate of human information, based on the degree of similarity between the feature amount of the tracking information query and the feature amount having the tracking information associated with person information in which collation has not been established.
Specifically, the narrowing unit 810 uses the degree of similarity calculated in S306. For example, when a plurality of items of person information in which the degree of similarity exceeds a predetermined threshold is present with respect to a tracking information query, the person information that corresponds to this tracking information query is determined to be a candidate. Subsequently, the narrowing unit 810 stores the association information of the tracking information query and the candidate of human information in the storage unit 260.
Referring back to
The above is an explanation of the processing on the image 420 (frame) at time tm. It has been explained above that the narrowing unit 810 narrows down the candidates of person information corresponding to the tracking information query, from among items of person information in which collation has not been established. Next, the processing on the image 430 (frame) at time tn will be explained, and the method in which the combination determination unit 270 determines the combination of collation target based on the degree of confirmation and the association information will be described. As described above, although, at time tm, both the person B and the person C turn backward, at time tn, only the person B turns sideways.
First, processes from S302 to S304 are performed as in the processing at time tm. Next, in S305, the processing performed by the combination determination unit 270 will be explained in detail by referring to
First, in S3051, the combination determination unit 270 obtains lists of tracking information queries. Next, in S3052, the combination determination unit 270 selects one tracking information query. Next, in S3053, the combination determination unit 270 calculates the degree of confirmation with respect to the tracking information query that has been selected in S3052. Next, in S3054, whether or not the degree of confirmation has been calculated with respect to all the tracking information queries acquired in S3051 is determined. If the degree of confirmation has not been calculated from all the acquired tracking information queries, the process returns to S3051 and the same processes are repeated. If the degree of confirmation has been calculated from for all the acquired tracking information queries, the process proceeds to S3055. By the processing to this stage, in the present embodiment, it is assumed that, in the tracking information query track8 corresponding to person B, a degree of confirmation is obtained that is high in comparison to the tracking information query track9 corresponding to the person C is obtained, as in the above.
Next, in S3055, the combination determination unit 270 determines the combination of collation target. In the present embodiment, the combination determination unit 270 determines a tracking information query to be collated based on the degree of confirmation, and further determines person information to be included in the combination based on the tracking information query and the association information. Here, the combination determination unit 270 determines including the tracking information query track 8 in the combination of collation target, by a method that is the same as in the method described above.
Next, the combination determination unit 270 refers to the association information shown in
Next, in S306, the person collation unit 240 performs a person collation by using the tracking information query included in the combination and the person information in which collation has not been established. In the present embodiment, it is assumed that person information person 2 is determined to correspond to the tracking information query track8.
Next, in S307, the display control unit 250 updates the display screen by a method similar to the method as described above. Finally, in S308, the CPU 101 determines whether or not a series of processes has been completed for all the frames. As the result of determination, when a series of processes has been completed for all the frames in the video image, the processing ends. In contrast, when a series of processes has not been completed for all the frames in the video image, the process returns to S302, and processes from S302 to S308 are repeated.
The above is an explanation regarding the processing for the frame at time tn. By the processing explained up to this stage, the narrowing unit 810 can narrow down the candidates of person information corresponding to the tracking information query, and the combination determination unit 270 can appropriately determine a combination of collation target based on the degree of confirmation and the association information.
As described above, the information processing apparatus 100 in the present embodiment can limit the number of items of person information to be included in the combination of collation target by performing the processing as described above by using the configuration including the narrowing unit 810, and thereby the amount of calculation in person collation can further be reduced.
Note that, in the above description, although the method in which the narrowing unit 810 determines the candidate of person information corresponding to the tracking information query based on the similarity of the feature amount of the tracking information query for the previous frames and the feature amount of person information has been explained, the method for determining the candidates is not limited to this method. For example, the narrowing unit 810 can also narrow down the candidates based on information regarding crossings of persons in the video image. Specifically, the occurrence of crossings of persons in the video image is detected based on a plurality of items of tracking information, and tracking information related to the crossing is estimated. Then, person information corresponding to the tracking information query associated before the occurrence of the crossing, as a candidate, relative to the tracking information associated with the crossing after the occurrence of crossing, is narrowed down.
An example of processing in this case will be explained with reference to
In the video image as shown in
The narrowing unit 810 first detects the occurrence of crossing. The occurrence of crossing is detected based on, for example, the positional relation of detection region based on the feature amounts that a plurality of items of tracking information has. Specifically, the distance between the center position of the detection region of this tracking information and the center position of the detection region of the other tracking information is calculated for each frame so that the occurrence of crossing is detected (checked) regarding a given item of tracking information. Then, when there is a frame in which the distance is less than a predetermined threshold, it is determined that a crossing has occurred at the time corresponding to that frame.
In the example as shown in
Thus, the narrowing unit 810 can narrow down candidates for person information based on the tracking information associated with the crossing. According to this method, the person information can appropriately be narrowed down even at the time when a crossing occurs.
Note that although it has been explained as above that the combination determination unit 270 determines the combination of collation target based on the association information and the degree of confirmation, the method for determining the combination is not limited to this. For example, it is possible to determine a combination of collation target based on the degree of confirmation and the association information, and then determine a tracking information query and person information not included in that combination to be a separate new combination.
For example, it has been explained above that the combination determination unit 270 determines the tracking information query track8, person information person 2 and person 3 to be a combination of collation target. However, the remaining tracking information query track9 and the remaining person information person4 for which collation has not been established may be determined to be a separate new combination. Thus, after a combination is determined, the combinations that remains are determined, and the person collation is performed on each of them, so that the person collation is performed after dividing into combinations that have a high probability of determining the person information with respect to a tracking information query, and consequently, the number of unnecessary person collations can further be reduced.
Note that it is assumed that, in the combination determination unit 270, a plurality of tracking information queries associated with a given item of person information is present in the association information. In this case, when person information is determined for at least one tracking information query, the remaining tracking information query associated with the same candidate may be the next collation target. This is because when person information corresponding to the tracking information query of a similar person is determined, the probability in which person information corresponding to the other tracking information query can be determined increases.
For example, with respect to the result of the calculation of the degree of similarity during the person collation as shown in
Subsequently, when the person collation unit 240 determines that the person information corresponding to the tracking information query track8 is the person information person2, and in the next loop, the combination determination unit 270 sets the tracking information query track9 to the collation target in the next loop. According to this method, regarding the remaining tracking information query associated with the same candidate, it is possible to omit the processing of calculating the degree of confirmation performed by the combination determination unit 270.
As explained above, according to the information processing apparatus 100 in the present embodiment, the number of times of collation can be reduced and the amount of calculation can be reduced by determining the combination of collation target corresponding to the person in the video image before the collation.
The information processing apparatus 100 in the present embodiment is particularly effective in the following case. That is, this is a case in which the person information corresponding to the person in the video image cannot be determined even if a person collation is performed on the person in the video image over a plurality of frames. This case can occur, for example, when a plurality of items of person information regarding a person wearing similar clothing is present in the database. In the prior art, there is a concern that the amount of calculation increases because, in the case in which the above-described case occurs, collation processing is repeatedly performed on the same person in the video image. According to the information processing apparatus 100 in the present embodiment, a reduction in calculation amount in such cases can be expected.
Although the preferred embodiment of the present invention has been described as shown above, the present invention is not limited to these embodiments, and various modifications and changes are possible within the scope of its gist. For example, among the functional blocks as shown in
The present invention can also be realized by processing in which a program that realizes one or more functions of the above embodiments is supplied to a system or device via a network or storage medium, and one or more processors in the computer of the system or device read and execute the program. In that case, the program and the storage medium storing the program configure the present invention. In addition, the present invention can also be realized by a circuit (for example, an ASIC) that provides one or more functions.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2022-071955, Apr. 25, 2022, which is hereby incorporated by reference wherein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2022-071955 | Apr 2022 | JP | national |