The present invention relates to a data collection technique.
PLT 1 discusses a technique for extracting an image of an object related to a search keyword from a cluster including images that are highly likely to be of the object related to the keyword.
The present invention is directed to reduction of noise included in a set of data related to a specific object. In order to solve the above-described issue, an information processing apparatus includes at least one memory storing instructions, and at least one processor that, upon execution of the stored instructions, cause the at least one processor to acquire a plurality of elements, classify the acquired elements into a plurality of clusters based on a similarity between the elements, select a first cluster related to a predetermined object based on a first index calculated from elements included in the classified clusters, select a second cluster based on a second index calculated from the first cluster and the elements included in the classified clusters, specify a marge element to be merged, among the acquired elements, based on the second cluster and the first cluster, and output a third cluster including the merge element, an element included in the first cluster, and an element included in the second cluster.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
In existing hierarchical clustering that is one of methods of grouping the same kind of images in a cluster, operation is performed such that a certain element (image) is included in a cluster having the highest possibility. Therefore, for example, in a case where a cluster A indicates an object A and a cluster B indicates an object B, an element (image) similar to the object B but indicating the object A is included in the cluster B. Further, in the case of the hierarchical clustering, an element indicating the object B may be included in the cluster indicating the object A due to repetition of integration with a similar cluster. In the present invention, to eliminate the object B included in the cluster A and realize a data set of only the object A, another cluster is added to a cluster having the highest reliability.
In the following, among collected data, an element to be extracted is referred to as a signal element, and an element to be removed is referred to as a noise element. In the case of PLT 1, the image of the object related to the keyword is a signal element, and other images are noise elements.
In the following, preferred exemplary embodiments of the present invention are described with reference to accompanying drawings.
A central processing unit (CPU) 101 is a central processing unit that controls the whole of the information processing apparatus 100. A read only memory (ROM) 102 is a read only memory that stores programs and parameters not requiring a change. A random access memory (RAM) 103 is a random access memory that temporarily stores programs and data supplied from an external apparatus and the like. An external storage device 104 is a storage device, such as a hard disk and a memory card, fixedly installed on the information processing apparatus 100. The external storage device 104 may include a flexible disc (FD), an optical disc such as a compact disc (CD), a magnetic or optical card, an integrated circuit (IC) card, a memory card, and the like that are attachable to and detachable from the information processing apparatus 100. Functions and processing of the information processing apparatus 100 described below are realized when the CPU 101 reads out the programs stored in the ROM 102 and the external storage device 104 and executes the programs.
An input interface (I/F) 105 is an interface with an input unit 109, such as a pointing device and a keyboard, to receive an operation of a user and input data. An output I/F 106 is an interface with a monitor (display device) 110 to display data held by the information processing apparatus 100 and supplied data. A data output method is not limited to the display device such as a monitor, and may be an output device, such as a speaker, that outputs audio. A system bus 108 is a transmission path communicably connecting the units 101 to 106.
The information processing apparatus 100 outputs a third cluster of extracted signal elements, and a result is output from an output unit 16. In order to actually implement the present invention, means for using the output result is necessary. It is assumed that the result is output to, for example, a display device or an analysis device for face authentication or the like, but description of the means for using the result is omitted because the application is not limited.
An acquisition unit 11 acquires data to be processed, namely, data related to a specific object. More specifically, the acquisition unit 11 acquires a large number of face images considered to be of the same person. For example, in a person search system using a monitoring camera, face images of persons resembling each other that are obtained by designating a query image and inputting a face image of a person to be searched for may be data to be processed. Alternatively, as discussed in PLT 1, data extracted by search using a keyword, such as a name, may be acquired from the Internet. The images (data) acquired by the acquisition unit 11 may also be referred to as elements in the subsequent processing.
In the present exemplary embodiment, face images are acquired as elements, but data to be handled is not limited to face images, and a set of elements (data) considered to be of the same kind are handled. Other examples of the elements considered to be of the same kind include photographs of a printed document or a handwritten document of a specific character (e.g., “A”), and images obtained by imaging a specific object (e.g., automobile of a specific model). Images obtained by performing conversion, such as enlargement/contraction/rotation, color correction and the like, on these images and clipping images in appropriate sizes therefrom may be used as the elements. As described above, the data collection method is not limited, and acquired data includes data including the specific object and other data (not including the specific object) in a mixed manner.
In other words, the elements are classified into signal elements and noise elements. More specifically, the signal elements are elements to be extracted and are face images of the same person. The noise elements are face images of persons different from the person of the signal elements, and the like. For example, in the above-described person search system, most of face images (elements) obtained by inputting “face image of Mr. A” are considered as “face images of Mr. A” (signal elements). Other images, namely, “face images of persons other than Mr. A (resembling Mr. A)”, face illustrations, and images other than face images are the noise elements.
A classification unit 12 classifies the acquired elements (images) into a plurality of clusters based on similarity among the elements. In other words, the classification unit 12 performs clustering on a set of predetermined elements acquired by the acquisition unit 11. Although details are described below, all of the elements are not clustered, but only elements having a strong relationship are extracted as one cluster or a plurality of clusters.
A first cluster selection unit 13 selects a core cluster (first cluster) related to a predetermined object among the classified clusters, based on a first index calculated from the elements included in the clusters. A cluster that is highly likely to include only the elements related to the predetermined object, namely, the signal elements is referred to as a core cluster in the following. Among the clusters classified by the classification unit 12, one cluster considered to include only the signal elements is extracted as a core cluster. Details are described below.
A second cluster selection unit 14 specifies a merge cluster (second cluster) based on a second index calculated from the elements included in the core cluster (first cluster) and the elements included in the plurality of classified clusters. In other words, among the plurality of classified clusters, a cluster similar to the selected cluster is specified. The merge cluster is a cluster different from the core cluster and is a cluster to be merged with the core cluster. A cluster similar to the core cluster is specified as a merge cluster. Details are described below.
A merge element specification unit 15 specifies a marge element to be merged, among the acquired elements, based on the merge cluster (second cluster) and the core cluster (first cluster). In other words, among the elements included in neither the core cluster nor the merge cluster, an element having a similarity with the core cluster or the merge cluster by a predetermined value (third threshold) or more is specified as a merge element. Details are described below.
The output unit 16 outputs a cluster (third cluster) including the merge element, the elements included in the core cluster (first cluster), and the elements included in the merge cluster (second cluster). In other words, the elements of the core cluster, the elements of the merge cluster, and the merge element are output as a noise-removed element list. Details are described below.
A storage unit 17 is the RAM 103, and stores as appropriate information necessary for performing the above-described processing of the acquisition unit 11 to the output unit 16.
The processing to be performed by the information processing apparatus 100 is described below.
First, in step S300, the acquisition unit 11 acquires a plurality of pieces of data (elements). This flow will be described with reference to
In step S300, the acquisition unit 11 acquires a plurality of pieces of data (elements). A result of the acquisition is output as an element list O21. An element is an image itself, and is typically an image loaded on a memory or an image file name. In a case where information related to the elements is necessary, an element-related information list O22 is output. Examples of the element-related information include order information. The order information is a number 1, 2, . . . assigned to images in the order that is most likely to be of Mr. A when “face image of Mr. A” is acquired. Alternatively, the element-related information may be certainty information. The certainty information is a degree of certainty as “face image of Mr. A” represented by, for example, a numerical value between 0 and 1 both inclusive. The order information and the certainty information are output depending on a target for which a set of elements is acquired by the acquisition unit 11. Although the output information is different depending on the target, for example, in the case of the person search system, if a degree of resemblance to a face image of a person to be searched for is output, the degree of resemblance serves as the certainty information. Further, in a case where face images are output in descending order of the degree of similarity, the output order serves as the order information.
Although, in the present exemplary embodiment, a case where the element-related information list includes the order information is described, in a case of acquiring the order information in a case where only the certainty information is obtainable, for example, the elements that are numbered from 1 in descending order of the certainty information may be used. The certainty information and the order information are not necessarily unique, and a plurality of elements may have the same value. The element-related information is used in processing on a subsequent stage, but if the element-related information is not used, the element-related information list O22 may not be output. A diagram element starting with a letter “O” is data to be handled in input or output of each processing, and is stored in the storage unit 17.
Refer back to the flowchart in
In step S3010, the classification unit 12 extracts a feature (first feature amount) from data (element) based on a first feature extraction method. The feature amounts are calculated for the respective elements in the element list O21, and are output as a feature amount list O41. The first feature extraction method may be an optional method. For example, a feature vector is acquired from a feature extraction trained model by using Deep Residual Learning for Image Recognition (ResNet) of a deep neural network.
In step S3011, the classification unit 12 acquires a similarity by comparing the features of the elements. More specifically, a similarity between two elements in the feature amount list O41 is calculated in a round-robin manner, and a similarity list O42 is output. Here, the similarity is a value within a range from −1 to 1, but is not limited thereto. As a method of calculating the similarity, for example, a cosine similarity between feature amounts can be used.
In step S3012, the classification unit 12 extracts elements having a similarity that is equal to or greater than a predetermined threshold to generate a cluster. That is, clustering is performed based on the similarity list O42. In other words, two optional elements included in one cluster have a similarity that is equal to or greater than the threshold (TH2). A result of cluster classification is output as a cluster list O43.
Further, the classification unit 12 also calculates the number of elements indicating the number of elements included in each of the clusters, and outputs the numbers of elements as a number-of-elements list O44. However, if the number-of-elements list O44 is not used in processing on the subsequent stage, the number-of-elements list O44 may not be output.
Refer back to the flowchart in
Thus, in step S3020, the first cluster selection unit 13 calculates a first certainty for each cluster based on the similarity between the elements, the number of elements, and a degree of relevance. An example in which the certainty is calculated based on the cluster list, the similarity list, the number-of-elements list, and the element-related information list is described. In other words, the first certainty (first index) is a certainty indicating a possibility that a predetermined object is included. Not all of the policies are necessarily used, but a cluster conforming to a larger number of policies is more desirable. The first certainty of each of the clusters in the cluster list O43 is calculated according to the above-described policies 11 to 13. The first certainty (first index) has, for example, a value within a range from 0 to 1, and a larger value indicates that the cluster is more “apparently Mr. A”.
A method of calculating the first certainty is described below.
When
the following equations are established.
where, Sele is a set of subscripts of the elements included in the cluster, Sall is a set of subscripts of all elements, simij is a similarity between an element i and an element j, n(S) is the number of elements in a set S, n(Sele) is the number of elements in the cluster, n(Sall) is the number of all elements, nPk is a total number of permutations obtained by selecting k elements from n elements, and ri is the order information of the element i.
For example, a first certainty CC1 in the case of the policy 11, a first certainty CC2 in the case of the policy 12, and a first certainty CC3 in the case of the policy 13 are respectively represented by equations 1 to 3. The equations described here are illustrative, and equations are not specifically limited as long as the equations represent evaluation values of the policies 11 to 13.
A first certainty CC of each of the clusters is calculated from the values of CC1 to CC3. An example of calculation of the first certainty CC is represented by equation 4. Note that α1, β1, and γ1 are predetermined values. In the equation 4, the values of CC1 to CC3 based on the policies 11 to 13 are used; however, an index based on another policy may be added. The first certainties CC of the respective clusters are output as a first certainty list O71.
As an example, the first certainties CC of the cluster A61, the cluster B62, and the cluster C63 are determined using the above-described equations. To simplify the calculation, a similarity indicated by the thick line is set to 0.7, and a similarity indicated by the thin line is set to 0.3. In addition, to simplify the calculation, the search order information of 10 or larger is all handled as 10.
First, the first certainty CC of the cluster A61 is determined. The number of elements in the cluster A61 is 9, the number of thick lines connecting the elements in the cluster A61 is 16, and the number of thin lines is 1, and thus CC1=0.319, CC2=0.360, and CC3=0.571 are calculated. Further, when α1=0.3, β1=0.1, and γ1=0.6 are set, CC=0.474 (first certainty of cluster A61) is obtained.
Likewise, the first certainty CC of the cluster B62 is obtained as CC=0.263 (first certainty of cluster B62). The first certainty CC of the cluster C63 is obtained as CC=0.225 (first certainty of cluster C63).
In step S3021, the first cluster selection unit 13 selects a cluster having the first certainty that is equal to and greater than the first threshold as a core cluster (first cluster), based on the first certainty. Alternatively, the first cluster selection unit 13 selects a cluster having the greatest first index as a core cluster. The first cluster selection unit 13 selects, as a core cluster O72, a cluster having the first certainty having the greatest value from the first certainty list O71. In a case where a plurality of clusters has the first certainty having the greatest value, one of the plurality of clusters or a union of the plurality of clusters is selected as the core cluster O72. In this example, the cluster A61 is selected because the cluster A61 has the greatest first certainty of 0.474.
In a case where the greatest first certainty does not exceed the first threshold, the processing may end without selecting a core cluster. In this case, the processing returns to the collection of elements or the extraction of features.
Refer back to the flowchart in
It can be specified whether the cluster is a merge cluster, for example, based on the following criteria.
The above-described policies are examples for determining whether the cluster is similar to the core cluster, and another policy may be used. In other words, a merge cluster (second cluster) is selected by calculating a core cluster belongingness (second index) that is an index indicating a similarity between the core cluster (first cluster) and each of the clusters other than the core cluster. Since the second index for each of the clusters other than the core cluster is calculated, the clusters to be processed may also be referred to as target clusters. In the following specific example, the target clusters are a cluster B and a cluster C.
In step S3030, the second cluster selection unit 14 calculates a core cluster belongingness for each cluster, based on the elements included in each of the clusters and the elements included in the core cluster. The core cluster belongingness of each of the clusters in the cluster list O43 is calculated according to the above-described policies 21 and 22. The core cluster belongingness has, for example, a value within a range from 0 to 1, and a larger value indicates that the cluster is more “similar to the core cluster”.
For example, the core cluster belongingness corresponding to the policy 21 is denoted by CB1, and the core cluster belongingness corresponding to the policy 22 is denoted by CB2. The following equations 5 to 8 represent an example of a method of calculating the core cluster belongingness. In the equations, α2, β2, and TH3 are predetermined values. Note that equations are not specifically limited as long as the equations represent evaluation values of the policies 21 and 22.
where, Sele is a set of subscripts of the elements included in the cluster, Score is a set of subscripts of all elements, simij is a similarity between an element i and an element j, n(S) is the number of elements in a set S, n(Sele) is the number of elements in the cluster, n(Score) is the number of all elements, Nsim is the number of elements having a similarity that is equal to or greater than TH1 among the similarities of the elements of the cluster and the core cluster, U(x) is a step function (0 when x<0, 1 when x≥0), and CC is the core cluster certainty of the cluster.
A core cluster belongingness CB (second index) of each of the clusters is calculated from the values of CB1 and CB2. When the core cluster belongingness CB with respect to the core cluster A is calculated for the cluster B and the cluster C, CB=0.160 is calculated for the cluster B, and CB=0.023 is calculated for the cluster C. In the equation 8, the values of CB1 and CB2 based on the policies 21 and 22 are used; however, an index based on another policy may be added. The core cluster belongingness CB of the respective clusters are output as a core cluster belongingness list O101.
The processing in step S3030 is described in more detail with reference to the above-described equations and
The core cluster belongingness CB2 is equal to the first certainty of the cluster B62, and is 0.263. When α2=0.9 and β2=0.1 are set, 0.160 is obtained as the core cluster belongingness CB of the cluster B62. In other words, from the equation 8, CB=0.160 (core cluster belongingness of cluster B62) is calculated.
Subsequently, the core cluster belongingness CB of the cluster C63 is determined. No solid line connects the elements in the cluster A61 and the elements in the cluster C63, and thus Nsim=0 is calculated from the equation 5. Since the value of the unit step function in the equation 6 is zero, CB1=0 is calculated from the equation 6. The core cluster belongingness CB2 is equal to the first certainty of the cluster C63, and is 0.225 from the equation 7. Based on the foregoing, when the core cluster belongingness CB of the cluster C63 is determined using the equation 8, 0.023 is calculated. In other words, the core cluster belongingness of the cluster C with respect to the core cluster A is calculated as CB=0.023 (core cluster belongingness of cluster C63). The second cluster selection unit 14 put together the core cluster belongingness of the respective clusters into a list, and outputs the core cluster belongingness list O101. Since the cluster A is the core cluster, calculation of the core cluster belongingness can be omitted.
In step S3031 illustrated in
When the threshold TH4=0.1 is set, only the cluster B62 is specified as a merge cluster from the above-described calculation results. Generally, a plurality of clusters can be specified as merge clusters; however, depending on the value of the core cluster belongingness, only one cluster or no cluster is specified in some cases. The cluster specified as a merge cluster is output as a merge cluster list O102.
Refer back to the flowchart illustrated in
The above-described policy is an example for determining whether the element is similar to the cluster of interest, and another policy may be used.
In step S3040, the merge element specification unit 15 determines a cluster of interest from the core cluster and the merge cluster. The cluster of interest is extracted from the core cluster O72 and the merge cluster list O102. Since the core cluster is the cluster A61, and the merge cluster is the cluster B62, these two clusters are transmitted to the subsequent processing. Processing in step S3041 is performed on each of the clusters of interest. First, processing is performed on the cluster A61.
In step S3041, the merge element specification unit 15 calculates a cluster belongingness of each of the elements included in neither the merge cluster nor the core cluster. The cluster belongingness has, for example, a value within a range from 0 to 1, and a larger value indicates that the element is more “similar to the cluster of interest”. For example, a cluster belongingness CBE1 corresponding to the policy 31 is represented by the following equations 9 to 11. In the equations, α3 and TH5 are predetermined values. Note that the described equations are illustrative, and equations are not specifically limited as long as the equations represent an evaluation value of the policy 31.
where, Iele is a subscript of the element, Stgt is a set of subscripts of the elements included in a target cluster, simij is a similarity between an element i and an element j, n(S) is the number of elements in a set S, n(Stgt) is the number of elements in the target cluster, Nesim is the number of similarities that are equal to or greater than TH1 among the similarities of the elements in the target cluster, and U(x) is a step function (0 when x<0, 1 when x≥0).
A cluster belongingness CBE is calculated from the value of CBE1. In the equation 11, only the value of CBE1 based on the policy 31 is used; however, an index based on another policy may be added. The cluster belongingness CBE of the respective elements with respect to the cluster of interest is output as a cluster belongingness list O131.
A state where a merge element is specified is described with reference to
Next, in step S3042, a merge element is specified from the result of the cluster belongingness list O131. If an element has the cluster belongingness with respect to any of the clusters of interest that is equal to or greater than a predetermined threshold TH6, the element is specified as a merge element. There may be a plurality of the elements satisfying the condition. When the threshold TH6=0.3 is set, four elements E1, E2, E3, and E4 are specified as merge elements (see
Refer back to
The operation is described with reference to
The elements of the core cluster are elements included in the core cluster. In
The elements of the merge cluster are elements included in the merge cluster. In
The merge element is an element to be merged to the core cluster that has been specified based on the third index, among the elements included in neither the merge cluster nor the core cluster. In
The extracted elements (16 elements) are collected into the noise-removed element list O171, and the noise-removed element list O171 is output.
Providing the second cluster selection unit has the effect of extracting signal elements that are not extracted by the existing technique. Further, a cluster of elements strongly similar to each other is created by the classification unit, which has the effect of reducing noise elements that are extracted by the existing technique. As a result, it is possible to reduce the noise included in a set of data related to a specific object.
In the first exemplary embodiment, the similarity used in clustering is used to calculate the core cluster belongingness (second index). However, in a case where, for example, as a result of the clustering, it is expected that the cluster A61 indicates “apparently Mr. A” and the cluster B62 indicates “apparently Mr. A wearing a mask”, use of a similarity calculated from the feature amount in a state of wearing a mask is more convenient. In other words, the similarity is desirably calculated using, in place of the feature amount used in clustering, a feature amount of a face wearing a mask, i.e., a feature amount insensitive to the texture of a mouth portion covered with a mask.
In other words, the feature amount is desirably acquired using two different calculation methods as the method of calculating the features of the elements. The second cluster selection unit 14 acquires the core cluster belongingness (second index) of each of the elements based on a feature amount acquired by a second method different from a first method. For example, when the first method is processing for extracting a feature related to a face from an entire face image, the second method may be processing for extracting a feature from an image obtained by performing predetermined conversion processing on a predetermined face region. In other words, the processing for calculating the similarity from the feature amount different from the feature amount used in clustering and calculating the core cluster belongingness may be performed by changing a part of the processing in the second cluster selection unit 14. The description is given with reference to
In step S3032, the second cluster selection unit 14 calculates a second feature amount for each of the elements, and outputs a second feature amount list O191.
In step S3010 illustrated in
In step S033, the second cluster selection unit 14 calculates a similarity between the second feature amounts by using the second feature amount list O191, and outputs a second similarity list O192.
In the above-described manner, in a case where a particular tendency is expected in a cluster, it is possible to extract a signal element with high accuracy by specifying a merge cluster using a feature amount matching with the tendency. In other words, it is possible to reduce the noise included in a set of data related to a specific object.
In the second exemplary embodiment, the core cluster belongingness is calculated based on the feature amount acquired by the plurality of feature extraction methods. At this time, the feature amount insensitive to the texture of a mouth portion covered with a mask is calculated. In a third exemplary embodiment, a feature extraction processing method for acquiring the second feature amount different from the first feature amount with a simple method without preparing a different feature extraction method is described. In a case where a feature amount extractor is trained using deep learning, it generally takes a lot of time, and a memory amount occupied by the feature amount extractor itself during feature extraction is doubled. Therefore, an image in which a part including a mouth of a face image of each element is covered is combined, and a feature amount of the combined image is calculated by the processing same as the processing used in step S3030.
In other words, the second cluster selection unit 14 acquires the second feature amount different from the first feature amount by performing predetermined processing on each of the elements.
In the above-described manner, even in a case where a particular tendency is expected in a cluster, it is possible to extract a signal element with high accuracy. Further, since one feature amount extractor is used, it is possible to prevent a lot of time from being taken to create the feature amount extractor, and to prevent the memory amount necessary for execution from being increased.
In all of the above-described exemplary embodiments, the cases related to images are described; however, the effects of the present patent are not limited to images, and therefore, the scope of the present patent is not limited to images.
The present invention is also realized by performing the following processing. That is, software (program) for realizing the functions of the above-described exemplary embodiments is supplied to a system or an apparatus through a network for data communication or various kinds of storage media. Further, a computer (or CPU, micro processing unit (MPU), etc.) of the system or the apparatus reads out and executes the program. The program may be provided by being recorded in a computer-readable recording medium.
The present invention is not limited to the above-described exemplary embodiments, and can be changed and modified in a various manner without departing from the spirit and the scope of the present invention. Therefore, to publicize the scope of the present invention, the following claims are attached.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2022-047586 | Mar 2022 | JP | national |
This application is a Continuation of International Patent Application No. PCT/JP2023/007959, filed Mar. 3, 2023, which claims the benefit of Japanese Patent Application No. 2022-047586, filed Mar. 23, 2022, both of which are hereby incorporated by reference herein in their entirety.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/JP2023/007959 | Mar 2023 | WO |
| Child | 18884804 | US |