This application claims the benefit of Korean Patent Application No. 10-2021-0137885 filed on Oct. 15, 2021, which is hereby incorporated by reference herein in its entirety.
The embodiments disclosed herein relate to a method of re-identifying the same person from a plurality of images taken at different places, and an apparatus for performing the same.
This work was supported by A.I Recognition&Tracking System Project through the National IT Industry Promotion Agency(NIPA, Korea) funded by the Ministry of Science(MSIT, Korea) Informatization Promotion Funds in 2021.
Person re-identification technology is a technology that detects and tracks the same person from images taken by a plurality of different cameras. Person re-identification technology is widely used not only in the field of security control but also in the process of tracking persons who have come into contact with an infected person in a public place used by many unspecified persons due to the recent COVID-19.
However, when re-identification is performed based on bodily features, the performance of the extraction of bodily features decreases as a photographing environment changes, and persons with similar bodily features may be present, so that there is a high possibility of error. Although accuracy may be improved when re-identification is performed based on facial features, there is still a limitation in terms of accuracy because a situation in which a face cannot be recognized occurs depending on a photographing angle.
Meanwhile, the above-described background technology corresponds to technical information that has been possessed by the present inventor in order to contrive the present invention or that has been acquired in the process of contriving the present invention, and can not necessarily be regarded as well-known technology that had been known to the public prior to the filing of the present invention.
The embodiments disclosed herein are intended to provide a method of re-identifying the same person from a plurality of images taken at different places, and an apparatus for performing the method.
As a technical solution for accomplishing the above object, according to one embodiment, there is provided a person re-identification method of identifying the same person from images taken through a plurality of cameras, the person re-identification method including: detecting a person from an image taken by any one of a plurality of cameras; extracting bodily and movement path features of the detected person, and also extracting a facial feature of the detected person if the detection of the face of the detected person is possible; and matching the detected person for the same person against persons included in images taken by the plurality of cameras based on at least one of the bodily and facial features while reflecting a weight according to the movement path feature.
According to another embodiment, there is provided a computer program stored in a computer-readable storage medium to perform a person re-identification method of identifying the same person from images taken through a plurality of cameras in combination with a computer, which is hardware, wherein the method includes: detecting a person from an image taken by any one of a plurality of cameras; extracting bodily and movement path features of the detected person, and also extracting a facial feature of the detected person if the detection of the face of the detected person is possible; and matching the detected person for the same person against persons included in images taken by the plurality of cameras based on at least one of the bodily and facial features while reflecting a weight according to the movement path feature.
According to still another embodiment, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a computer, causes the computer to execute a person re-identification method of identifying the same person from images taken through a plurality of cameras therein, wherein the method includes: detecting a person from an image taken by any one of a plurality of cameras; extracting bodily and movement path features of the detected person, and also extracting a facial feature of the detected person if the detection of the face of the detected person is possible; and matching the detected person for the same person against persons included in images taken by the plurality of cameras based on at least one of the bodily and facial features while reflecting a weight according to the movement path feature.
According to still another embodiment, there is provided a computing apparatus for performing a person re-identification method of identifying the same person from images taken via a plurality of cameras, the computing apparatus including: an input/output interface configured to receive images from a plurality of cameras, and to output a result of person re-identification; storage configured to store a program for performing person re-identification; and a controller comprising at least one processor; wherein the controller, by executing the program, detects a person from an image taken by any one of the plurality of cameras, extracts bodily and movement path features of the detected person and also extracts a facial feature of the detected person if the detection of the face of the detected person is possible, and matches the detected person for the same person against persons included in images taken by the plurality of cameras based on at least one of the bodily and facial features while reflecting a weight according to the movement path feature.
The above and other objects, features, and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
Various embodiments will be described in detail below with reference to the accompanying drawings. The following embodiments may be modified to various different forms and then practiced. In order to more clearly illustrate features of the embodiments, detailed descriptions of items that are well known to those having ordinary skill in the art to which the following embodiments pertain will be omitted. Furthermore, in the drawings, portions unrelated to descriptions of the embodiments will be omitted. Throughout the specification, like reference symbols will be assigned to like portions.
Throughout the specification, when one component is described as being “connected” to another component, this includes not only a case where the one component is “directly connected” to the other component but also a case where the one component is “connected to the other component with a third component disposed therebetween.” Furthermore, when one portion is described as “including” one component, this does not mean that the portion does not exclude another component but means that the portion may further include another component, unless explicitly described to the contrary.
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
In this specification, there are introduced embodiments of a person re-identification method for identifying the same person from images taken by different cameras. In particular, in order to improve identification accuracy, the concept of a “movement path feature” is introduced for the first time, and is used in the process of performing person re-identification. Accordingly, before describing the embodiments of a person re-identification method, the concept of a “movement path feature” will be first described with reference to
First, the term “movement path feature” is defined as a probability value corresponding to the time taken to move along a specific path. In greater detail, the “movement path feature” is a probability value corresponding to the movement time of a specific person obtained according to a probability density function for the times (movement times) taken for a plurality of persons to move along a specific path.
A method of obtaining a probability density function used for the extraction of a movement time feature will be described in detail below.
It is assumed that a first camera 10 and a second camera 20 shown in
Referring to
In order to calculate the times taken for persons to move from the first location to the second location, the same person needs to be identified from among the persons included in the images taken at the two locations. The server connected to the two cameras 10 and 20 may extract facial features of the persons included in the images taken at the first and second locations by analyzing the images, and may identify the same person from the two images by comparing the extracted facial features. In the process of obtaining a probability density function for movement times, the accurate identification of the same person is required, so that only persons whose facial features can be extracted may be taken into consideration.
The server connected to the two cameras 10 and 20 may obtain a movement time between the first and second locations for persons who are determined to be the same persons. The server may determine that the difference between the time point at which person A appears in the image taken at the second location and the time point at which person A appears in the image taken at the first location is the movement time of person A. In addition, in a similar manner, the server may determine that the difference between the time point at which person B appears in the image taken at the second location and the time point at which person B appears in the image taken at the first location is the movement time of person B.
The server may calculate and distribute the movement times of a large number of persons for various paths by utilizing the identification numbers and photographing times of cameras, and may then obtain a probability density function for movement times by using distributed results. A method of collecting the moving times of persons for a specific path and obtaining a probability density function for the moving times will be described in detail with reference to
Referring to
In
Person A: t3−t2
Person B: t3−t2
Person D: t3−t1
Person F: t2−t1
After calculating movement times for a large number of persons in this manner, the server may check the frequency for each movement time, may generate a histogram showing the frequency distribution, and may obtain a probability density function from the generated histogram.
The probability density function obtained according to the process described above may be used to extract a movement path feature. This will be described in detail with reference to
Referring to
The input/output interface 110 is a component for the input/output of data and commands. The input/output interface 110 may receive images from a plurality of cameras, and may display results obtained by performing person re-identification on the images or transmit the results to another apparatus. Furthermore, the input/output interface 110 may receive a command related to the performance of person re-identification, etc. from a user. The input/output interface 110 may include a component for receiving input such as a keyboard, hard buttons, or a touch screen, a component for performing output such as an LCD panel, and a component for performing input/output such as a wired/wireless communication port.
The controller 120 is a component including at least one processor such as a central processing unit (CPU), and controls the overall operation of the computing apparatus 100. In particular, the controller 120 may implement an artificial neural network model for performing person re-identification by executing a program stored in the storage 130 to be described later. A specific method by which the controller 120 generates an artificial neural network model for performing person re-identification and performs person re-identification using the artificial neural network model will be described in detail below.
The storage 130 is a component for storing data and a program, and may include at least one of various types of memory such as RAM, HDD, and SSD. A program for implementing an artificial neural network model for performing person re-identification may be stored in the storage 130.
A method of performing person re-identification according to an embodiment will be described in detail with reference to
As described above, the controller 120 of the computing apparatus 100 shown in
Referring to
The operations of the respective modules will be described with reference to the flowcharts of
Referring to
At step 702, the bodily feature extractor 420 may extract a bodily feature of each of the detected persons, the movement path feature extractor 440 may extract the movement path feature of each of the detected persons, and the facial feature extractor 430 may also extract a facial feature when the detection of the face of each of the detected persons is possible (e.g., when the face of the person detected from a screen corresponds to a predetermined proportion or more). Detailed steps included in step 702 may be configured in various manners, and in particular, may vary depending on whether a face can be detected, which will be described in detail with reference to
Referring to
At step 802, the facial feature extractor 430 determines whether face detection is possible from the person regions 510 and 520 detected at step 701. For example, the facial feature extractor 430 may determine that face detection is possible if a facial region visible from each of the detected person regions 510 and 520 corresponds to a predetermined proportion or more of a total facial region. Otherwise, the facial feature extractor 430 may determine that face detection is not possible.
If it is determined that face detection is possible, the process proceeds to step 803, at which the facial feature extractor 430 extracts a second feature vector representing a facial feature of each of the detected persons. In this case, the extracted second feature vector may be used in the process of identifying the same person and then training (updating the probability density function for movement times) the movement path feature extracter 430 later, and may also be used to calculate similarity with previously stored feature vectors at the step of matching the same persons. To this end, feature vectors representing facial features of persons detected from images previously taken by various cameras may also be stored in the gallery (see step 703 to be described later).
Step 804 is a step that may be selectively included as can be seen from the dotted line. At step 804, the movement path feature extractor 440 extracts a movement path feature of each of the detected persons. The reason that step 804 is a selectively inclusive step is that the identification of the same person based on a facial feature has a considerably high accuracy, so that once the second feature vector representing the facial feature has been extracted, there is no need to perform person re-identification by reflecting the movement path feature. However, even in the case where the facial feature is extracted, when the movement path feature is reflected, even a little higher re-identification accuracy can be expected, so that the process is configured to selectively include step 804.
At step 805, the trainer 450 trains the movement path feature extractor 440 using the second feature vector. Training the movement path feature extractor 440 refers to updating the probability density function for movement times. The method of obtaining a probability density function for movement times was discussed with reference to
Although step 805 may be performed after step 804 as shown in
If it is determined at step 802 that the detection of the face of each of the detected persons is not possible, the process proceeds to step 807, at which the movement path feature extractor 440 extracts a movement path feature of each of the detected persons. In this case, same-person matching may be performed at step 703 based on the first feature vector representing the bodily feature extracted at step 801 and the movement path feature.
Another embodiment of step 702 of
Referring to
If it is determined that face detection is possible, the process proceeds to step 902, at which the facial feature extractor 430 extracts a second feature vector representing a facial feature of each of the detected persons.
Step 903 is a step that may be selectively included like step 804 described above. At step 903, the movement path feature extractor 440 extracts a movement path feature of each of the detected persons.
At step 904, the trainer 450 trains the movement path feature extractor 440 using the second feature vector.
Although step 904 may be performed after step 903 as shown in
If it is determined at step 901 that the detection of the face of each of the detected persons is not possible, the process proceeds to step 905, at which the bodily feature extractor 420 extracts a first feature vector representing a bodily feature of each of the detected persons.
At step 906, the movement path feature extractor 440 extracts a movement path feature of each of the detected persons.
Referring back to
Depending on the types of features extracted at step 702, the method of performing same-person matching may vary slightly. A method of performing same-person matching based on bodily and movement path features will be described in detail with reference to
Referring to
Meanwhile, for person re-identification, features (bodily and facial features) of persons included in images taken by the same or a different camera at a previous time point are stored in advance in a database, and this database is referred to as a “gallery.”
In the embodiment shown in
The matcher 460 calculates similarities by comparing the feature vector representing the bodily feature of person A with the feature vectors representing the bodily features of persons X to Z stored in advance in the gallery one by one. A method of calculating similarities between feature vectors may be implemented in various manners, and one of them will be introduced as follows.
According to an embodiment, a similarity between two feature vectors may be calculated via cosine similarity. Each feature vector is expressed as one coordinate in a specific space. The similarity may be determined by calculating the angle between the two feature vectors. The cosine similarity is a method of determining the similarity based on the angle between two vectors in a dot product space. If the two vectors to be compared are A=[A0, A1, . . . , An-1] and B=[B0, B1, . . . , Bn-1], respectively, the cosine distance, which is the dot product value between the two vectors whose size is 1, may be calculated according to Equation 1 below, and the similarity may be determined according to the value of the calculated cosine distance. The cosine distance has a value between −1 and 1. The closer to 1, the higher the similarity.
According to the method described above, the similarity is calculated by comparing the feature vector representing the bodily feature of person A with the feature vectors representing the bodily features of persons X to Z.
Now a method of reflecting a weight according to a movement path feature in a similarity between feature vectors will be described.
In
For example, according to the probability density function for movement times, when the probability value corresponding to the movement time (t3−t1) is 50% and the probability value corresponding to the movement time (t3−t2) is 10%, a weight according to 50% may be added or multiplied to the similarity between the feature vectors of person A and person X, and a weight according to 10% may be added or multiplied to the similarity between the feature vectors of person A and persons Y and Z.
The matcher 460 may determine whether there is a person who can be regarded as being the same as person A among persons X to Z based on the similarities in which the weights according to the movement time features are reflected through this process, may match persons determined to be the same persons, and may output the persons as a re-identification result.
In
As described above, the matcher 460 may perform matching varying depending on the types of features that are extracted at previous steps. More specifically, i) when bodily and movement path features are prepared, matching may be performed by reflecting weights according to movement path features in similarities between feature vectors representing bodily features, ii) when bodily and facial features are prepared, matching may be performed based on the similarities between feature vectors representing the respective features, iii) when bodily, facial, and movement path features are all prepared, matching may be performed based on the similarities between feature vectors representing respective features while reflecting weights according to movement path features therein, iv) when facial and movement path feature are prepared, matching may be performed by reflecting weights according to movement path features in similarities between feature vectors representing the facial features, and v) when only facial features are prepared, matching may be performed based on similarities between feature vectors representing the facial features.
According to the above-described embodiments, the effect of performing person re-identification with high accuracy even when a photographing environment changes or determination based only on appearance information is ambiguous may be expected by taking into consideration movement path features reflecting movement times between cameras.
Furthermore, the effect of increasing re-identification accuracy may be expected by continuously training the model for extracting movement path features through comparison between facial features.
The effects that can be obtained by the embodiments disclosed herein are not limited to the above-described effects, and other effects that have not been described above will be clearly understood by those having ordinary skill in the art, to which the present invention pertains, from the foregoing description.
The term ‘unit’ used in the above-described embodiments means software or a hardware component such as a field-programmable gate array (FPGA) or application-specific integrated circuit (ASIC), and a ‘unit’ performs a specific role. However, a ‘unit’ is not limited to software or hardware. A ‘unit’ may be configured to be present in an addressable storage medium, and also may be configured to run one or more processors. Accordingly, as an example, a ‘unit’ includes components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments in program code, drivers, firmware, microcode, circuits, data, a database, data structures, tables, arrays, and variables.
Each of the functions provided in components and ‘unit(s)’ may be coupled to a smaller number of components and ‘unit(s)’ or divided into a larger number of components and ‘unit(s).’
In addition, components and ‘unit(s)’ may be implemented to run one or more CPUs in a device or secure multimedia card.
The person re-identification method according to the embodiments described in conjunction with
Furthermore, the person re-identification method according to the embodiments described in conjunction with
Accordingly, the person re-identification method according to the embodiments described in conjunction with
In this case, the processor may process instructions within a computing apparatus. An example of the instructions is instructions which are stored in memory or a storage device in order to display graphic information for providing a Graphic User Interface (GUI) onto an external input/output device, such as a display connected to a high-speed interface. As another embodiment, a plurality of processors and/or a plurality of buses may be appropriately used along with a plurality of pieces of memory. Furthermore, the processor may be implemented as a chipset composed of chips including a plurality of independent analog and/or digital processors.
Furthermore, the memory stores information within the computing apparatus. As an example, the memory may include a volatile memory unit or a set of the volatile memory units. As another example, the memory may include a non-volatile memory unit or a set of the non-volatile memory units. Furthermore, the memory may be another type of computer-readable medium, such as a magnetic or optical disk.
In addition, the storage device may provide a large storage space to the computing apparatus. The storage device may be a computer-readable medium, or may be a configuration including such a computer-readable medium. For example, the storage device may also include devices within a storage area network (SAN) or other elements, and may be a floppy disk device, a hard disk device, an optical disk device, a tape device, flash memory, or a similar semiconductor memory device or array.
The above-described embodiments are intended for illustrative purposes. It will be understood that those having ordinary knowledge in the art to which the present invention pertains can easily make modifications and variations without changing the technical spirit and essential features of the present invention. Therefore, the above-described embodiments are illustrative and are not limitative in all aspects. For example, each component described as being in a single form may be practiced in a distributed form. In the same manner, components described as being in a distributed form may be practiced in an integrated form.
The scope of protection pursued via the present specification should be defined by the attached claims, rather than the detailed description. All modifications and variations which can be derived from the meanings, scopes and equivalents of the claims should be construed as falling within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0137885 | Oct 2021 | KR | national |