The present disclosure relates to data analysis techniques.
Face authentication techniques using deep learning utilize an authentication system trained with different face images of an identical human figure as training data. There is known a method of automatically performing annotation on a group of face images of an identical human figure that is identified by a method such as clustering, based on feature values extracted from the face images, among methods for creating such training data. In the automatic annotation by clustering, in some cases, face images of a human figure are classified into different clusters, and face images of different human figures are classified into an identical cluster. International Publication No. 2010/041377 discusses a method of selecting a cluster including display target images based on likelihood as a result of clustering, and displaying the selected cluster and the images belonging to the cluster, for an operator to correct the result of clustering.
However, the method discussed in International Publication No. 2010/041377, a cluster with the highest likelihood in clustering is first selected as a display target, and subsequently, clusters are sequentially selected in ascending order of likelihood from a cluster with the lowest likelihood as display targets. Because of a low degree of similarity between a cluster with high likelihood in clustering and a cluster with low likelihood in clustering, face images belonging to the clusters at mutually low degrees of similarity are displayed together on a screen. This results in an issue that the displayed face images are likely to be determined to be different human figures that are actually a human figure, which likely to cause an error in checking and correcting a result of clustering.
According to an aspect of the present disclosure, an information processing apparatus includes a clustering unit configured to perform clustering on a data group based on a feature value of each of a plurality of pieces of data, a determination unit configured to determine a representative cluster among clusters generated by the clustering unit, a first identification unit configured to identify a first cluster based on a degree of similarity between each of the clusters generated by the clustering unit and the representative cluster, a second identification unit configured to identify a second cluster based on a first degree of similarity, which is the degree of similarity between each of the clusters generated by the clustering unit and the representative cluster, and a second degree of similarity, which is a degree of similarity between each of the clusters generated by the clustering unit and the first cluster, and a selection unit configured to select at least one piece of data for display from among the representative cluster, the first cluster, and the second cluster.
Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings.
A first exemplary embodiment will now be described.
The system control unit 111 includes a central processing unit (CPU), and a graphics processing unit (GPU), and reads out programs stored in the ROM 112 or the HDD 114 to perform various types of processing. The RAM 113 is used as a temporary storage area, such as a main memory and work area of the system control unit 111. The ROM 112 stores various kinds of data and various kinds of programs. The HDD 114 is a secondary storage area that stores data used in performing clustering and various kinds of programs. Functions and processing of the information processing apparatus 100, which will be described below, are implemented by the system control unit 111 running programs stored in the ROM 112 or the HDD 114.
The display device 115 includes a display, and displays various kinds of information. The input device 116 includes a keyboard and a mouse, and accepts various kinds of operations performed by a user. The display device 115 and the input device 116 may be integrated like a touch panel display. The display device 115 may be a device that performs projection with a projector. The input device 116 may be a device that recognizes the position of a fingertip in a projected image with a camera.
The configuration of the information processing apparatus 100 is not limited to a configuration in which the HDD 114, the display device 115, and the input device 116 are arranged inside the housing of the information processing apparatus 100 as illustrated in
The acquisition unit 201 acquires a data group as a processing target. In the present exemplary embodiment, data as the processing target is face images. As a face image group serving as the processing target, the acquisition unit 201 collects a face image group estimated to belong to an identical human figure by using, for example, a system that searches images recorded by a monitoring camera for a specific human figure or web search with a name of the specific human figure as a search query. Processing target data is not limited to face images. Examples of a data group estimated to be of an identical type include print data collected for specific characters or handwriting image data, and image data collected for specific objects (an automobile of a specific type). Assume that a face image group that is estimated to be of an identical human figure and that is acquired by the acquisition unit 201 is an image group collected on the premise that face images of a human figure “A” are to be collected.
The extraction unit 202 extracts feature values from the data group, as the processing target, that is acquired by the acquisition unit 201. Details of a feature value extraction method will be described below.
The clustering unit 203 performs clustering of a plurality of pieces of data (the face image group here) based on the feature values extracted by the extraction unit 202. Details of the clustering will be described below.
The determination unit 204 determines a core cluster, i.e., a cluster composed of a representative image group of image groups acquired by the acquisition unit 201, among clusters generated by clustering performed by the clustering unit 203. The core cluster herein is a cluster composed of a face image group with the highest likelihood of being “A”. The core cluster is an example of a representative cluster. Details of a method of determining the core cluster will be described below.
The first identification unit 205 identifies a first cluster based on degrees of similarity between each cluster generated by clustering performed by the clustering unit 203 and the core cluster determined by the determination unit 204. In the present exemplary embodiment, the first identification unit 205 identifies a cluster composed of an image group with the lowest degree of similarity to an image group constituting the core cluster as the first cluster, among the clusters generated by clustering performed by the clustering unit 203. In other words, the first identification unit 205 identifies the cluster composed of the face image group with the lowest likelihood of being “A”. Details of a method of identifying the first cluster will be described below.
The second identification unit 206 identifies a second cluster based on degrees of similarity to the core cluster (a first degree of similarity) and degrees of similarity to the first cluster (a second degree of similarity), among the clusters generated by clustering performed by the clustering unit 203. In the present exemplary embodiment, the second identification unit 206 identifies a cluster with a degree of similarity to an image group constituting the core cluster within a middle range and with a degree of similarity to an image group constituting the first cluster within a middle range as the second cluster, among the clusters generated by the clustering unit 203. In other words, the second identification unit 206 identifies the cluster composed of the face image group with an intermediate value of likelihood of being “A” among the face image group acquired by the acquisition unit 201. Details of a method of identifying the second cluster will be described below.
The selection unit 207 selects face images as display targets from among the core cluster determined by the determination unit 204, the first cluster identified by the first identification unit 205, and the second cluster identified by the second identification unit 206. Details of a selection method will be described below. Each face image selected herein is an example of data for display.
The display control unit 208 performs control to display the face images selected by the selection unit 207 on the display device 115. Details of a display control method will be described below.
Subsequently, the overall steps of processing performed by the information processing apparatus 100 according to the present exemplary embodiment will now be described.
In step S301, the acquisition unit 201 acquires an image group as a processing target. In this step, the acquisition unit 201 may also acquire related information regarding each image together. The related information is, for example, rank information indicating likelihood of being an acquisition target. Specifically, for a face image of the human figure “A” as the acquisition target, the rank information is numbers allocated as 1, 2, and so on in the descending order from the face image with the highest likelihood of being the human figure “A”. The rank information is output from the acquisition source of the image group. For example, if a face image group of a specific human figure is acquired from a system that searches for human figures, degrees of similarity to a face image of the specific human figure serves as the rank information. The degree of similarity is likelihood or the like obtained when matching with the face image of the specific human figure as a search query is performed. If face images are sequentially output in the descending order of degrees of similarity, the output order serves as the rank information. If the output order of face images as a search result from collection of a face image group from web search reflects the likelihood, the output order serves as the rank information. The rank information is not necessarily unique, and images with an identical value may overlap with each other.
In step S302, the extraction unit 202 performs pre-processing on each image acquired in step S301 to obtain a preferable feature in feature extraction. For face images as the processing target, the pre-processing is normalization of each face image. Specifically, the pre-processing is processing of face-detection processing on a face image as the processing target to estimate positions of facial organ points, such as eyes, a nose, and a mouth, geometric transformation to position the face at the center of the image based on the positions of the estimated organ points, and moving the face to the center of the image. Other pre-processing includes removal of noise superimposed on the image and trimming.
In step S303, the extraction unit 202 extracts feature values from the images acquired in step S301. There is a method using a feature extraction model trained using a residual neural network (ResNet), which is a deep neural network, as described in Kaiming He, et. Al, to extract feature values. The feature extraction model is trained to output a more similar feature value from an image that is estimated to have higher likelihood of being an identical human figure using a great number of images as data for training. In this step, for example, data having normalized values in the range of 0 to 1 in several hundred dimensions is obtained as feature values (Kaiming He, et. al., Deep Residual Learning for Image Recognition in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)).
In step S304, the clustering unit 203 performs clustering of the image group acquired in step S301 using the feature values extracted in step S303. In the present exemplary embodiment, hierarchical clustering is used. A distance function for clustering (such as Euclidean distance or cosine similarity) and a method for measuring a distance between clusters (such as group averaging method or Ward's method) are not limited. In the present exemplary embodiment, the following is a description of a case where cosine similarity (hereinafter referred to as a degree of similarity) is used as a distance function for feature values of face images.
The process of hierarchical clustering typically includes calculating a degree of similarity between clusters, and changing the number of clusters to be generated through comparison with a threshold for determination whether a cluster with a different degree of similarity is regarded as an identical cluster. In the case of the degree of similarity used as the distance function in clustering as described above, a higher threshold makes it easier for clusters to be regarded as different clusters, which will increase the number of clusters to be generated. In contrast, a lower threshold makes it easier for a lot of clusters to be regarded as an identical cluster, which will decrease the number of clusters to be generated. In the present exemplary embodiment, as at least three clusters, i.e., the core cluster, the first core cluster, and the second cluster are to be identified, a threshold with which at least three clusters will be generated is set in advance.
In step S305, the determination unit 204 determines the core cluster. The core cluster is the most representative cluster of the image groups as the processing target acquired in step S301, out of the plurality of clusters generated as a result of processing in step S304. Specifically, the core cluster is a cluster composed of an aggregation of face images with the highest likelihood of being the identical person (an aggregation of face images obviously representing the human figure A). Conditions used when the determination unit 204 determines the core cluster are as follows.
In the present exemplary embodiment, the determination unit 204 calculates a core cluster confidence score of each cluster under the above-described Conditions 1 to 3. The core cluster confidence score takes a value in the range from 0 to 1, for example. A higher value indicates that the human figure is more obviously the human figure A. Specifically, the determination unit 204 calculates a core cluster confidence score CC of each cluster using the following Expressions 1 to 4.
Signs in the above-described Expressions 1 to 4 represent the following.
A core cluster confidence score CC1 that takes a higher value as Condition 1 is more satisfied is calculated by using the above-described Expression 1. A core cluster confidence score CC2 that takes a higher value as Condition 2 is more satisfied is calculated by using the above-described Expression 2. A core cluster confidence score CC3 that takes a higher value as Condition 3 is more satisfied is calculated by using the above-described Expression 3.
The core cluster confidence scores CC1 to CC3 calculated as just described are substituted into the above-described Expression 4, whereby a comprehensive core cluster confidence score CC is calculated. Values of CC1 to CC3 based on Conditions 1 to 3 are used as described above, but an evaluation value for a core cluster confidence score calculated based on other conditions may be used. A value of any one of CC1 to CC3 may be selectively used. The above-described Expressions 1 to 3 are examples, and a specific expression is not required as long as an evaluation value of a core cluster confidence score based on Conditions 1 to 3 is represented.
Subsequently, a specific procedure for determining the core cluster among the clusters 1 to 4 illustrated in
First, the determination unit 204 calculates the core cluster confidence score CC regarding each cluster using the above-described Expressions 1 to 4. Assume that the average value of degrees of similarity between elements is 0.9 in the cluster 1, 0.9 in the cluster 2, and 0.8 in the cluster 3, and 0.8 in the cluster 4. Assume that α1=0.3, β1=0.1, and γ1=0.6. As illustrated in
An example of calculating the core cluster confidence score CC regarding the cluster 1 will now be described.
The core cluster confidence score CC1 is a value obtained by dividing a sum of degrees of similarity between elements by all combinations, and thus is equal to the average value of degrees of similarity between the elements in a cluster. In the cluster 1, CC1=0.9. The core cluster confidence score CC2 is the number of elements in a cluster with respect to the total number of elements. The larger the number of elements in a cluster is, the higher the core cluster confidence score CC2 becomes. In the cluster 1, CC2=9/(9+7+5+3)=0.375. The core cluster confidence score CC3 is a confidence score calculated based on rank information. As a cluster includes more elements with smaller rank values, the core cluster confidence score CC3 becomes higher. In the cluster 1, (1/1+1/2+1/3+1/5+1/8+1/10×4)/(1/1+1/2+1/3+1/4+1/5+1/6+1/7+1/8+1/9+1/10×(24−9))=0.59. Finally, using CC1 to CC3 calculated as described above, the core cluster confidence score CC is obtained as CC=0.3×0.9+0.1×0.375+0.6×0.59=0.662.
Respective core cluster confidence scores of the clusters 2 to 4 are similarly calculated. Assuming that CC=0.419 is obtained in the cluster 2, CC=0.339 is obtained in the cluster 3, and CC=0.30 is obtained in the cluster 4 as the results of the calculation, it is the cluster 1 that has the highest core cluster confidence score. Consequently, the cluster 1 is determined to be the core cluster.
In step S306, the first identification unit 205 identifies a cluster with the lowest degree of similarity to the core cluster determined in step S305 as the first cluster. The first cluster may be hereinafter referred to as a cluster X. The degrees of similarity between the clusters have been already calculated in the clustering in step S304. The first identification unit 205 refers to the degrees of similarity between the clusters stored in the ROM 112 to identify the cluster with the lowest degree of similarity to the core cluster. In the example in
In step S307, the second identification unit 206 identifies a cluster with a degree of similarity to the core cluster identified in step S305 within a middle range and with a degree of similarity to the cluster X identified in step S306 within a middle range, as the second cluster. The second cluster may be hereinafter referred to as a cluster Y. The middle range is a predetermined range from 0 to 1.0, for example, a range from 0.45 to 0.55, where the degree of similarity is defined in a range from a minimum value of 0 to a maximum value of 1.0. In other words, the second identification unit 206 identifies the cluster composed of a group image with middle degrees of similarity to the core cluster and the cluster X. In the case illustrated in
In step S308, the clustering unit 203 determines whether the cluster Y has been identified in step S307. If the clustering unit 203 determines that the cluster Y has been identified (YES in step S308), the processing proceeds to step S311. If the clustering unit 203 determines that the cluster Y has not been identified (NO in step S308), the processing proceeds to step S309. In step S309, the clustering unit 203 increments a clustering counter i indicating the number of times clustering is performed, and the processing returns to step S304. In step S304, the clustering unit 203 performs re-clustering. In the re-clustering, in order to generate clusters as candidates for the cluster Y, the clustering unit 203 sets a threshold that is greater than that at the time of the previous clustering to increase the number of clusters. After that, the information processing apparatus 100 performs the processing from steps S305 to S307 again, and identifies a cluster corresponding to the cluster Y. However, because there is a possibility that no cluster corresponding to the cluster Y exists even if the re-clustering is performed, clustering is performed with an upper limit set to a predetermined number (for example, three) as the number of times clustering is performed. The clustering unit 203 increments the clustering counter i every time the processing enters the step of the re-clustering (in step S309). In step S310, the clustering unit 203 thereby determines whether the clustering counter i exceeds the predetermined number. If the clustering unit 203 determines that the clustering counter i exceeds the predetermined number (YES in step S310) after repeating the re-clustering until the clustering counter i exceeds the predetermined number, the processing proceeds to step S311.
In step S311, the selection unit 207 selects images for display from the clusters. The clusters herein are the core cluster, the cluster X, and the cluster Y identified by the processing so far. The selection unit 207 selects an identical number of images from an image group belonging to each cluster to prevent a lot of images of a specific cluster from being displayed. For example, as images for display, the selection unit 207 selects three face images from the cluster 1 as the core cluster, three face images from the cluster 4 as the cluster X, and three face images from the cluster 2 as the cluster Y. The selection unit 207 selects images corresponding to elements closer to the center of each cluster so that a representative image of each cluster will be displayed.
In step S312, the display control unit 208 performs control to display the images for display selected in step S311 on the display device 115. After that, the processing in this flowchart ends.
If the NEXT button 704 is pressed, processing for transition to determination about the face image of the different human figure is started.
The above-described UI is merely an example, and the user's operation can be input by another means. For example, in consideration of work efficiency of the user, a configuration may be used that allows the determination about whether the face images are of the identical human figure to be input through keyboard operations alone.
As illustrated in
An up/down button 705 is also displayed on the display screen 700. When the user presses an up (triangle) button or down (inverted triangle) button of the up-and-down button 705, the display control unit 208 increases or decreases the number of images to be displayed on the window 701. Specifically, when the user presses the up (triangle) button, the selection unit 207 increases the number of images to be selected for display from each cluster, and the display control unit 208 newly displays the additionally selected image(s) on the window 701. For example, assuming that three images for display are selected from each cluster, when the user presses the up (triangle) button, the number of images for display to be selected from each cluster is changed to four. While the cluster 4 is identified as the cluster X in the present exemplary embodiment, three elements are included in the cluster 4, and thus it is assumed that the number of images for display to be selected from the cluster X remains to be three. Consequently, the number of images to be displayed on the window 701 is increased from nine to eleven.
According to the present exemplary embodiment as just described, when the image group belonging to each cluster generated as a result of clustering is displayed, the image group having the middle degree of similarity to each of these image groups is displayed together with the image group with the highest likelihood of being the target human figure and the image group with the lowest likelihood of being the target human figure. In comparison between the image group with the highest likelihood of being the target human figure and the image group with the lowest likelihood of being the target human figure, displaying the image group with the middle degree of similarity facilitates determination about whether these image groups are of the identical human figure, providing a higher work efficiency in operation of checking results of clustering.
As a modification of the present exemplary embodiment, when the cluster Y is not identified in step S308 (NO in step S308) in the flowchart in
A second exemplary embodiment will now be described. In the first exemplary embodiment, the description has been given of the display method used when a user is caused to determine whether the whole of the face images to be displayed are of a target human figure. In the second exemplary embodiment, a display method will be described that is used when a user is caused to determine whether each cluster or each image is of a target human figure. In the following description, the redundant description common to the first exemplary embodiment will be omitted, and differences from the first exemplary embodiment will be mainly described.
As described above, the present exemplary embodiment allows the user to determine whether each cluster of the displayed face images or each of the displayed face images is of the target human figure. Also in this case, similarly to the first exemplary embodiment, the image group with the middle degree of similarity to each of these image groups is displayed, together with the image group with the highest likelihood of being the target human figure and the image group with the lowest likelihood of being the target human figure. This provides a higher work efficiency in operation of checking whether each image is of the target human figure.
The present disclosure includes a case where a program of software is installed directly or remotely in a system or an apparatus, and functions of the exemplary embodiments described above are implemented by the system or a computer of the apparatus reading out and running codes of the installed program. In this case, the installed program is a computer-readable program for the flowchart illustrated in the exemplary embodiments. Further, the functions of the above-described exemplary embodiments may be implemented by the computer running the readout program, or may be implemented by the computer in collaboration with an operating system (OS) on the computer based on instructions of the program. In this case, the OS or the like performs part or all of actual processing, and the functions of the above-described exemplary embodiments are implemented by the processing.
Embodiment(s) of the present disclosure can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc™ (BD)), a flash memory device, a memory card, and the like.
While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the present disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2022-087373, filed May 30, 2022, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2022-087373 | May 2022 | JP | national |