These above and/or other aspects and advantages of the present invention will become apparent and more readily appreciated from the following detailed description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.
The preprocessor 410 comprises a region division unit 411 and a feature extraction unit 412 to perform preprocessing operations of adaptively segmenting a region of an inputted photo through analyzing content of the photo and extracting a visual feature from the segmented region of the photo.
The region division unit 411 analyzes the content of the inputted photo and adaptively segments the region of the photo based on the analyzed content of the photo, as shown in FIG. 5.
As shown in
The region division unit 411 calculates a dominant edge and an entropy differential through analyzing the content of the inputted photo, and adaptively segments the region of the inputted photo based on the calculated dominant edge and the entropy differential.
The region division unit 411 also calculates edge elements for each possible division direction through analyzing the content of the inputted photo, and segments the region of the photo in the direction of a dominant edge through analyzing the calculated edge element. Namely, the region division unit 411 calculates the edge elements for each possible division directions by analyzing the content of the inputted photo and segments the region of the photo in the direction of the dominant edge when a maximum edge element of the calculated edge element is greater than a first threshold and a difference of the calculated edge elements is greater than a second threshold.
When the content of the inputted photo is analyzed, the edge elements for each of the possible division directions are analyzed, and the region of the photo is segmented in the direction of the dominant edge will be described as follows. The region division unit 411 compares a horizontal edge element and a vertical edge element, calculated as the edge element for each of the possible division directions, and horizontally segments the region of the photo when the maximum edge element is the horizontal edge element, the horizontal edge element is greater than the first threshold, and a difference between the horizontal edge element and the vertical edge element is greater than the second threshold. Also, the region division unit 411 vertically segments the region of the photo when the maximum edge element is the vertical edge element, the vertical edge element is greater than the first threshold, and a difference between the vertical edge element and the horizontal edge element is greater than the second threshold.
Conversely, in a case in which the content of the inputted photo is analyzed, the edge elements for each of the possible division directions are analyzed, and the region of the photo is segmented by calculating entropy when the direction of the dominant edge is not determined will be described as follows. When the dominant edge direction is not determined as a result of the analysis of the edge element for each of the calculated possible division directions, the region division unit 411 calculates entropy for each expected division regions of the inputted photo and segments the region of the photo in the direction where a difference between calculated entropy values is the greatest.
Namely, when an expected division direction is a vertical direction as shown in 610 of
For example, as shown in
When the photo 520 is inputted, the region division unit 411 analyzes the content of the inputted photo 520, segments an entirety of the photo 520 in a horizontal direction depending on a calculated possible division direction edge element or an entropy difference, analyzes the photo 520 segmented horizontally, and segments an upper part of the segmented photo 520 in a vertical direction. Accordingly, the photo 520 is segmented by the region division unit 411, into three regions 521, 522, and 523.
When the photo 530 is inputted, the region division unit 411 analyzes the content of the inputted photo 530, segments an entirety of the photo in a vertical direction depending on a calculated possible division direction edge element or an entropy difference, analyzes the photo 530 segmented vertically, and segments a right part of the segmented photo 530 in a horizontal direction. Accordingly, the photo 530 is segmented by the region division unit 411, into three regions 531, 532, and 533.
When the photo 540 is inputted, the region division unit 411 analyzes the content of the inputted photo 540, segments an entirety of the photo 540 in a vertical direction depending on a calculated possible division direction edge element or an entropy difference, analyzes the photo 540 segmented vertically, and segments a left part of the segmented photo 540 in a horizontal direction. Accordingly, the photo 540 is segmented by the region division unit 411, into three regions 541, 542, and 543.
When the photo 550 is inputted, the region division unit 411 analyzes the content of the inputted photo 550 and segments an entirety of the photo 550 in a horizontal direction depending on a calculated possible division direction edge element or an entropy difference. Accordingly, the photo 550 is segmented by the region division unit 411, into two regions 551 and 552.
When the photo 560 is inputted, the region division unit 411 analyzes the content of the inputted photo 560 and segments an entirety of the photo 560 in a vertical direction depending on a calculated possible division direction edge element or an entropy difference. Accordingly, the photo 560 is segmented by the region division unit 411, into two regions 561 and 562.
When the photo 570 is inputted, the region division unit 411 analyzes the content of the inputted photo 570 and segments an entirety of the photo 570 into a central region 571 and a peripheral region 572 depending on a calculated possible division direction edge element or an entropy difference. In this case, since the peripheral region 572 is not a rectangle, it is not easy to extract a visual feature. Therefore, the photo 570 is segmented into the central region 571 and an entire region including the central region. Accordingly, the photo 570 is segmented by the region division unit 411, into two regions 571 and 572.
The feature extraction unit 412 extracts a visual feature of each of the segmented regions of the photo. Namely, the feature extraction unit 412 extracts visual features from each of the segmented regions of the photo, such as a color histogram, an edge histogram, a color structure, a color layout, and a homogeneous texture depicter. According to an embodiment of the present invention, the feature extraction unit 412 extracts the visual feature from each of the segmented regions according to a tradeoff between time and precision of a system in a content-based image retrieval field by using various feature combinations. Accordingly, the feature extraction unit 412 extracts the visual feature through the various feature combinations from each of the segmented regions according to a category as defined by the present invention.
In the case of the photo 510, the feature extraction unit 412 extracts a visual feature from each of the regions 511, 512, and 513 segmented by the region division unit 411. In the case of the photo 520, the feature extraction unit 412 extracts a visual feature from each of the regions 521, 522, and 523 segmented by the region division unit 411. In the case of the photo 530, the feature extraction unit 412 extracts a visual feature from each of the regions 531, 532, and 533 segmented by the region division unit 411. In the case of the photo 540, the feature extraction unit 412 extracts a visual feature from each of the regions 541, 542, and 543 segmented by the region division unit 411.
As described above, unlike a conventional photo category classification system unconditionally dividing an inputted photo into at least one region and each region having 10 sub-regions without considering content of the photo as shown in
The classifier 420 comprises a local concept classification unit 421, a regression normalization unit 422, and a global concept classification unit 423 to classify a category of the inputted photo according to the visual feature extracted by the preprocessor 410.
The local concept classification unit 421 analyzes the visual feature extracted by the feature extraction unit 412 and models a local semantic concept included in the photo from the segmented region to classify a local concept. Namely, to model each local semantic concept, the local concept classification unit 421 previously prepares certain learning data to extract visual features, learns via a pattern trainer such as support vector machines (SVM), and classifies a local concept via the pattern learner depending on the visual feature. Accordingly, the local concept classification unit 421 acquires confidence values for each of the local semantic concepts from each region as a result of classifying the local concept via the pattern learner. For example, the confidence value for each of the local concepts (see
The regression normalization unit 422 acquires a posterior probability value by normalizing the confidence values for each of the local concepts classified by the local concept classification unit 421 via regression analysis.
The global concept classification unit 423 classifies a global concept by modeling a global semantic concept that a category concept, included in the photo, through posterior probability values for each of the local semantic concepts acquired by the regression normalization unit 422. Namely, the global concept classification unit 423 classifies global concept models previously learned via the pattern trainer to model the global semantic concept by using a pattern classifier. Accordingly, the global concept classification unit 423 acquires confidence values for each category classified by the pattern classifier. The confidence values for each of the categories may be expressed as −0.3 in the case of architecture, 0.1 in the case of an interior, −0.5 in the case of a night view, 0.7 in the case of terrain, and 1.0 in the case of a human being, for example.
The postprocessor 430 estimates classification noise of a confidence value for the category of the photo, classified by the classifier 420, and performs a postprocessing operation of removing the estimated classification noise. The postprocessor 430 estimates a noise occurrence probability or a category existence probability and outputs a determined confidence value by filtering the confidence value for the category of photo, classified through the classifier 420. The postprocessor 430 clusters a situation by analyzing a plurality of photos, classifies scenes for the photos in the same cluster, calculates a noise probability for each scene category, and updates a confidence value for each scene to reduce the classification noise, by reflecting the calculated noise probability in the confidence value.
x=x′+η (an input+noise)
ŝ
i
=F
c
[g
i
]=F
c
[s
i
+n
i
]≈F
c
[s
i]|
To design a noise reduction filter (F) having excellent performance, two conditions as below must be satisfied.
1) Fc
2) Other aspects related to a precise classification result are not deteriorated and there is no unfavorable side effect with respect to Fc
Since an unexpected result value (n) is generated by a noise result, when there is a prior knowledge with respect to a noise probability density function, the unexpected result value (n) is removed by filtering as shown in Equation 1.
ŝ
i
=p(gi)(1−p(ni|ci))| [Equation 1]
In this case, the noise probability may be estimated by various methods as below.
As described above, the noise reduction method according to the present invention uses various situation information included in a photo, such as syntactic hints.
Generally, without a prior knowledge with respect to an input signal, it is difficult to distinguish a difference between a signal and noise. Accordingly, a histogram is available for estimating the noise probability density function.
In the present invention, to acquire the histogram, situation-based groups, which are groups of photos whose image information is temporally similar are considered. In this case, to readjust the confidence value that is the result of the category classification, temporal homogeneity in which similar photos exist before and after a corresponding photo is used.
In an embodiment of the present invention, the noise probability is estimated based on a fact that similar categories may exist in photos which are images sequentially photographed by the same user, and the classification noise is removed by the estimated noise probability.
Appearance frequency of each category in on situation group is calculated as Equation 2.
For example, when the same situation-based group including a present photo is formed of 10 photos including 8 photos with respect to a terrain category and 2 photos with respect to an interior category, appearance frequency of the terrain category may be 8/10 and appearance frequency of the interior category may be 2/10.
The postprocessor 430 readjusts the confidence value by using the probability value acquired by the histogram method as shown in Equation 3, thereby removing the noise.
ŝ
i
=p(gi)p(ci|m)| [Equation 3]
For example, when the confidence value of a terrain category is 0.5 and the confidence value of an interior category is 0.8, the postprocessor 430 may readjust the confidence value for each of the categories by multiplying the confidence value 0.5 by the appearance frequency 8/10 of the terrain category and multiplying the confidence value 0.8 of the interior category by the appearance frequency 2/10 of the interior category.
As described above, the photo category classification system 400 reduces a confidence value of a category whose appearance frequency is low from the photos in the same situation-based group, thereby improving the confidence of the photo category classification.
Also, to estimate a more precise probability, the postprocessor 430 may integrate the posterior probability of the confidence value acquired by the classifier as shown in Equation 4.
In this case, C indicates a total number of categories to be classified, N indicates a total number of photos existing in a given situation m, and gij indicates a confidence value that is a result of ith category from a jth photo, acquired by the pattern classifier.
As described above, when a plurality of photos are analyzed to be images sequentially photographed, the postprocessor 430 estimates through using a fact that similar categories exist, and removes the classification noise of the photo by reflecting the estimated noise probability in the confidence value acquired through the global semantic concept modeling.
According to another embodiment of the present invention, noise is removed by modeling Exchangeable image file (Exif) metadata included in a photo file. Namely, classification noise may be removed based on a probability of belonging to a category, estimated by modeling Exif metadata probability. When the photo is acquired from a digital camera, the Exif metadata comprises various information related to the photo, for example, a flash use and an exposure time.
The postprocessor 430 models a situation probability density function with respect to the Exif metadata acquired by learning many training data, extracts the Exif metadata included in the photo file, calculates a situation probability with respect to the extracted Exif metadata, and removes the classification noise by reflecting the calculated situation probability in a category classification confidence value of the photo file.
For example, noise reduction filtering performed by an interior/exterior classifier by using a flash use (F) and an exposure time (E) as metadata, as shown in Equation 5.
ŝ=p(gi)p(E|ci)p(F|ci) [Equation 5]
As described above, the postprocessor 430 performs the postprocessing operations of estimating the probability of belonging to the category through probability modeling by analyzing the metadata with respect to the photo and removing the classification noise by reflecting the estimated probability in the confidence value acquired by modeling global semantic concepts.
According to yet another embodiment of the present invention, noise reduction is performed by filtering based on an update rule between categories. Namely, the filtering is performed by using a fact that categories having opposite concepts cannot simultaneously exist in one photo, as an estimation method based on a rule using correlation of a category group.
For example, to an interior category, an exterior category such as terrain, waterside, sunset, snowscape, and architecture are the categories having opposite concepts. Namely, since the interior category is opposite to the exterior categories, it is impossible for both to be in the same photo.
Filtering classification noise by using the correlation between the interior category and the exterior category is performed as shown in Equation 6.
As another example of the categories having opposite concepts, there are a macro category, and other categories excluding the macro category. The postprocessor 430 may filter by distinguishing the macro photo from a result of classified categories by using a fact that a macro photo is incompatible with any other category. Namely, when there are the macro category and the other categories as the result of category classification of the inputted photo, and a confidence value of the macro category is greater than confidence values of the other categories, the postprocessor 430 may perform filtering to remove the other categories.
The postprocessor 430 filters the macro category and the interior category as shown in Equation 7.
To verify whether the inputted digital photo is a macro photo, the postprocessor 430 uses Exif information including macro information below.
1) a subject distance: generally less than 0.6 m;
2) subject distance ranges 0: unknown, 1: macro, 2: close view, and 3: distance view; and
3) macro information in a maker note.
When the inputted digital photo is a macro photo, the postprocessor 430 determines a probability value of the category to be 1 and determines a probability value of the interior category to be 0. Accordingly, when the inputted digital photo is the macro photo as shown in Equation 7, the postprocessor 430 reflects the probability value of the interior category in a classification confidence value of the inputted digital photo, thereby filtering the confidence value of the interior category opposite to the macro photo category, to be 0.
As described above, when confidence values of mutually opposite categories exist as a result of analyzing the confidence values acquired by global semantic concepts modeling, the postprocessor 430 performs a postprocessing operation of removing a category whose confidence value is low.
Accordingly, the photo category classification system 400 classifies the category of the inputted photo and removes the classification noise from the confidence value of the classified category, thereby providing a more precise category classification result.
In sub-operation 910, the photo category classification system calculates edge elements for each of possible division directions by analyzing content of the inputted photo. Specifically, the photo category classification system analyzes the content of the inputted photo and calculates an edge element for a horizontal direction or an edge element for a vertical direction when the possible division direction is the horizontal direction or the vertical direction.
In sub-operation 920, the photo category classification system determines whether a maximum edge element MaxEdge of the calculated edge elements is greater than a first threshold Th1 and whether a difference Edge_Diff of the calculated edge elements is greater than a second threshold Th2.
For example, when the calculated edge elements are a horizontal direction edge element and a vertical direction edge element and the horizontal direction edge element is greater than the vertical direction edge element, the photo category classification system determines whether the horizontal direction edge element is greater than the first threshold Th1, and whether a difference between the horizontal direction edge element and the vertical direction edge element is greater than the second threshold Th2.
Also, for example, when the calculated edge elements are a horizontal direction edge element and a vertical direction edge element, and the vertical direction edge element is greater than the horizontal direction edge element, the photo category classification system determines whether the vertical direction edge element is greater than the first threshold Th1 and whether a difference between the vertical direction edge element and the horizontal direction edge element is greater than the second threshold Th2.
When the maximum edge element MaxEdge is greater than the first threshold Th1 and the difference between the edge elements is greater than the second threshold Th2, in sub-operation S925, the photo category classification system segments the region of the photo in the direction of the dominant edge, which is a direction of the maximum edge element MaxEdge.
For example, when the maximum edge element MaxEdge is the horizontal direction edge element, the horizontal direction edge element is greater than the first threshold Th1, and the difference between the horizontal direction edge element and the vertical direction edge element is greater than the second threshold Th2, in sub-operation 925, the photo category classification system segments the region of the photo in the horizontal direction that is the direction of the dominant edge.
For example, when the maximum edge element MaxEdge is the vertical direction edge element, the vertical direction edge element is greater than the first threshold Th1, and the difference between the vertical direction edge element and the horizontal direction edge element is greater than the second threshold Th2, in sub-operation 925, the photo category classification system segments the region of the photo in the vertical direction that is the direction of the dominant edge.
Conversely, when the maximum edge element MaxEdge is equal to or less than the first threshold Th1 and/or the difference between the edge elements is equal to or less than the second threshold Th2, in sub-operation 930, the photo category classification system calculates entropy of expected division regions of the photo.
In sub-operation 940, the photo category classification system determines whether an maximum value of entropy differences MaxEntropy
For example, when each of the expected division regions are expected to be segmented in the vertical direction and the horizontal direction as shown in
For example, when each of the expected division regions are expected to be segmented in the vertical direction and the horizontal direction as shown in
When the maximum value of the entropy differences MaxEntropy
When the maximum value of the entropy differences MaxEntropy
When the division level of the photo is 1, in sub-operation 955, the photo category classification system segments into the central region 571 and the peripheral region 572 of the photo 570 as shown in
In sub-operation 960, the photo category classification system determines whether the division level of the photo is N. The N may be 3 when the photo category classification system tries to segment the region of the photo into 3.
When the division level of the photo is not N, in sub-operation 970, the photo category classification system selects a next segmented region by increasing the division level of the photo by 1 and performs the operations from sub-operation 910 again.
When the division level of the photo is N, in sub-operation 960, the division level of the photo is not 1, or after dividing the region of the photo into the central region 571 and the peripheral region 572, the photo category classification system finishes the operation of dividing the region of the photo based on the content of the photo.
As described above, according to the photo category classification method, the region of the photo is segmented by calculating the possible division direction edge elements of the photo by analyzing the content of the photo and calculating the entropy for each of the expected division regions of the photo, thereby reducing a number of the segmented regions compared to a conventional method of simply dividing the region of the photo into at least one region with a plurality of sub-regions without reflecting the content of the photo.
Referring to
As described above, the photo category classification method may relatively reduce an amount of time for extracting the visual features from the number of the segmented regions due to a reduced number of the segmented regions shown in
As described above, operations 810 and 820 are preprocessing operations for classifying the category of the photo in operations 830 through 850, which is a process of analyzing the content of the inputted photo, dividing the region of the photo based on the content of the photo, and extracting the visual feature form the segmented region of the photo.
In operation 830, the photo category classification system models local semantic concepts included in the photo according to the extracted visual feature. Specifically, to model each of the local semantic concepts, the photo category classification system extracts the visual features by previously preparing certain learning data, learns via the pattern learner such as SVM, and classifies local concepts via the pattern classifier, according to the extracted visual features.
In operation 840, the photo category classification system acquires a posterior probability value by normalizing via regression analysis with respect to confidence values acquired by local semantic concept modeling.
In operation 850, the photo category classification system models a global semantic concept included in the photo by using the posterior probability value for each of the local semantic concepts. Namely, to model the global semantic concept, the photo category classification system classifies global concept models previously learned via the pattern learner, from the pattern classifier.
In operation 860, the photo category classification system removes classification noise with respect to a confidence value acquired by the global semantic concept modeling. Specifically, the photo category classification system analyzes a plurality of photos, estimates a noise probability by using a fact that a probability that similar categories may exist in photos that are images sequentially photographed is high, and removes the classification noise by reflecting the estimated noise probability in the confidence value acquired by the global semantic concept modeling.
According to another embodiment of the present invention, operation 860, the photo category classification system estimates a probability of belonging to a category through probability modeling by analyzing metadata with respect to the photo and removes the classification noise by reflecting the estimated probability to the confidence value acquired by the global semantic concept modeling, as postprocessing operations for improving classification confidence with respect to the category of the inputted photo.
According to still another embodiment of the present invention, operation 860, the photo category classification system analyzes the confidence value acquired by the global semantic concept modeling and removes a category whose confidence value is low when confidence values with respect to mutually opposite categories exist.
In operation 1020, the photo category classification system classifies scenes in each situation cluster.
In operation 1030, the photo category classification system calculates a noise probability for each of the scene categories. Namely, when photos are images sequentially photographed in series of time, by one user, the photo category classification system estimates the noise probability with respect to each of the scene categories based on the fact that similar categories may exist in photos which are images sequentially photographed by the same user.
In operation 1040, the photo category classification system updates the confidence value of the photo to reduce the classification noise. Specifically, the photo category classification system updates the confidence value of the photo by reflecting the estimated noise probability in the confidence value of the photo.
Also, according to another embodiment of the present invention, in operation 1040, the photo category classification system may update the classification confidence value of the photo by estimating a probability of belonging to the category acquired by probability modeling Exif metadata included in the photo and removing the classification noise with respect to the confidence value of the photo based on the estimated probability.
Also, according to still another embodiment of the present invention, the photo category classification system may update the classification confidence value of the photo by filtering to remove the classification noise with respect to the classification confidence value of the photo by using a feature that categories of opposite concepts cannot exist simultaneously in one photo, as rule-based estimation method using correlation of a category group.
Comparing
Comparing
The photo category classification method according to the present invention may be embodied as a program instruction capable of being executed via various computer units and may be recorded in a computer-readable recording medium. The computer-readable medium may include a program instruction, a data file, and a data structure, separately or cooperatively. The program instructions and the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those skilled in the art of computer software arts. Examples of the computer-readable media include magnetic media (e.g., hard disks, floppy disks, and magnetic tapes), optical media (e.g., CD-ROMs or DVD), magneto-optical media (e.g., optical disks), and hardware devices (e.g., ROMs, RAMs, or flash memories, etc.) that are specially configured to store and perform program instructions. The media may also be transmission media such as optical or metallic lines, wave guides, etc. including a carrier wave transmitting signals specifying the program instructions, data structures, etc. Examples of the program instructions include both machine code, such as produced by a compiler, and files containing high-level language codes that may be executed by the computer using an interpreter.
An aspect of the present invention provides a photo category classification method and system capable of reducing an amount of time used for classifying a category of a photo while minimally deteriorating category classification performance.
An aspect of the present invention also provides a photo category classification method and system improving category classification precision through removing classification noise with respect to a result value passing a category classifier.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2006-0064760 | Jul 2006 | KR | national |