This application claims the benefit of Korean Patent Application No. 10-2004-0027578, filed on Apr. 21, 2004, and Korean Patent Application No. 10-2005-0029960, filed on Apr. 11, 2005 in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
Embodiments of the present invention relate to digital albums, and more particularly, to apparatuses, media, and methods for detecting a situation change of a digital photo, and a method, medium, and apparatus for situation-based clustering in a digital photo album.
2. Description of the Related Art
Recently, the use of digital cameras have become widespread. This is attributed to advantages of the digital camera not needing film and film printing processes, unlike analog cameras, and being able to store and delete contents at any time by using a digital memory device. Since the performance of the digital cameras have increased, all while sizes have been minimized in line with the development of digital camera technologies, users can essentially now carry digital cameras and take photos any time, any place. With the development of digital image processing technologies, the quality of a digital camera image is approaching that of the analog camera, and users can share digital contents more freely because of easier storage and transmission of the digital contents than analog contents. Accordingly, digital camera usage is increasing, causing prices to fall, and as a result, demand for the same to increase even further.
In particular, with the recent development of memory technologies, high capacity memories are now widely used, and with the development of digital image compression technologies that do not compromise picture quality, users can now store hundreds to thousands of photos in one memory. As a result, many users are using digital albums to manage so many photos.
Generally, a digital photo album is used to transfer photos taken by a user, from a digital camera or a memory card to a local storage apparatus, and to manage the photos conveniently. Users browse many photos in a time/date series or in order of event or share the photos with other users by using the photo album.
However, many users are finding it inconvenient to manage photos by using the conventional digital photo albums. This is because most of the conventional digital albums leave jobs for grouping and labeling photos to users. As the number of photos increases, it becomes more difficult for a user to cluster one by one, such that it becomes more inconvenient. Accordingly, a tool for enabling users to more easily and quickly find desired photos and generate a desired group of a plurality of photos is greatly needed.
In Requirement for photoware (ACM CSCW, 2002), David Frohlich investigated the function of a photo album required by users through a survey of many users. Most interviewees thought storing photos of their lifetime in albums to be valuable. However, they felt the time and effort required for grouping many photos one by one to be inconvenient and experienced difficulties in sharing photos with other people.
In the related research and systems of the initial stage, photos were grouped by using only time/date information, i.e., the time/date when a photo was taken. As a leading research, there was Adrian Graham's “Time as essence for photo browsing through personal digital libraries”, (ACM JCDL, 2002). As in this research, by using only the taken time/date, photos can be grouped roughly. However, this method cannot be used when a photo is recorded without time/date information or time/date information is lost later during photo editing processes. In addition, it is highly probable that undesired grouping results will occur if photos taken in similar time/date bands in different situations by using many cameras are grouped at one time.
In Kerry Rodden's “How do people manage their digital photographs” (ACM CHI, 2002), a photo album with a function capable of sorting photos using time/date information was developed and users were interviewed on the utility of the developed system. It shows that even only sorting photos in order of their respective taken time/dates helps users construct albums. However, the article added that, in order to more faithfully satisfy the requirements of users, content-based search or event-based photo clustering function should be added.
A method to solve these problems of photo grouping by using only time/date information currently includes using content-based feature values of a photo. So far there has been research performed using time/date information of photos and content-based feature values together. However, in most cases only color information of a photo is used as a content-based feature value. As a most representative method, Alexander C. Loui's automated event clustering and quality screening of consumer pictures for digital albuming (IEEE Transaction on Multimedia, vol. 5, No. 3, pp. 390-401, 200-3) suggests a method clustering a series of photos based on events by using time/date and color information of photos. However, since only color histogram information of a photo is used as a content-based feature value, it is very sensitive to brightness changes and it is difficult to sense changes in texture and shapes.
Today, most of digital photo files comply with an exchangeable image file (Exif) format. Exif is a standard file format made by Japan Electronic Industry Development Association (JEIDA). An Exif file stores photographing information such as information on a time/date when a photo is taken, and camera status information as well as pixel information of a photo.
Also, MPEG-7, ISO/IEC/JTC1/SC29/WG11 is standardizing element technologies required for content-based search with description interfaces to express descriptors and the relations between descriptors and description schemes. A method for extracting content-based feature values such as color, texture, shape, and motion is suggested as a descriptor. In order to model contents, the description scheme defines the relation between two or more descriptors and the description schemes and defines how data is expressed.
Accordingly, if various information that can be obtained from a photo file and content-based feature values are used together, more effective photo grouping and searching can be performed. Accordingly, a description scheme to express integrally this variety of information items and a photo album providing photo grouping and searching using the structure are needed.
Embodiments of the present invention provide a method, medium, and apparatus for detecting a situation change in a digital photo in order to cluster photos based on situations of photographing by using basic photo information stored in a photo file, taken by using a digital photographing apparatus for example, e.g., a digital camera, and a variety of content-based feature value information items extracted from the contents of the photos.
Embodiments of the present invention also provide a method, medium, and apparatus for situation-based clustering in a digital photo album in order to construct an album with photos, taken by a digital photographing apparatus for example, by clustering photos based on situations of photographing by using a digital photo situation change detecting method, medium, and apparatus, so that users may easily store photo groups in an album and share the grouped photos with other users.
To achieve the above and/or other aspects and advantages, embodiments of the present invention set forth, an apparatus for detecting a situation change in digital photos, including a photo sort unit sorting photos, desired to be situation-based clustered, in order of time, a time feature value obtaining unit obtaining predetermined time feature values from each of two contiguous photos among the sorted photos, a content-based feature value extraction unit extracting predetermined content-based feature values from each of the two contiguous photos, a dissimilarity measuring unit measuring dissimilarity between the two photos by making predetermined time feature value importances reflect respective time feature values, and by making predetermined content-based feature value importances reflect respective content-based feature values, and a situation change detection unit detecting a situation change by determining the situation change if an amount of the dissimilarity is equal to or greater than a predetermined threshold.
A predetermined content-based feature value may be generated based on pixel information of a photo, and include a visual descriptor including color, texture, and shape feature values, and an audio descriptor including a voice feature value.
A time feature value importance and a content-based feature value importance may be determined by referring to situation-based clustering hints including at least one of entire brightness information of a photo (Brightness), complexity information of the photo (Level of detail), homogeneous texture information of the photo (Homogeneous texture), edge information of the photo (Heterogeneous texture), information on whether the photo is monochrome (Monochromatic), information indicating a degree of colorfulness of a color expression of the photo (colorfulness), information indicating an entire color coherence shown in the photo (color coherence), information indicating a color temperature of a color of the photo (color temperature), information indicating whether a photo file of the photo includes taken time information (Taken time), information indicating that, if the photo and another photo are taken by different cameras in similar time bands and are clustered together, time information of the photo overlaps time information of the other photo and an importance of corresponding time information is lowered when the photo is situation-based clustered (Time overlap), information indicating whether voice information of a user is stored together with the photo when the photo was taken and is included with the photo as an audio clip file (Audio clip), and information indicating voice words and sentence strings recognized in an audio file of the photo (Speech recognition).
To achieve the above and/or other aspects and advantages, embodiments of the present invention set forth an apparatus for situation-based clustering of a digital photo album, including a photo description information generation unit generating photo description information describing a photo and including at least a photo identifier, an albuming tool description information generation unit generating albuming tool description information including a predetermined parameter for situation-based clustering of digital photos, an albuming tool performing photo albuming through situation-based clustering by using at least the photo description information and the albuming tool description information, a photo group information generation unit generating predetermined photo group description information from an output of the albuming tool, and a photo albuming information generation unit generating predetermined photo albuming information by using the photo description information and the predetermined photo group description information for situation-based clustering of the digital photo album.
Among the photo identifier, information on an author of the photo, photo file information, camera information, photographing information, and a content-based feature value, the photo description information may include at least the photo identifier, with the content-based feature value being generated by using pixel information of the photo, and includes a visual descriptor including color, texture, and shape feature values, and/or an audio descriptor including a voice feature value.
The albuming tool description information generation unit may include at least one of a sort key generation unit generating items for sorting photos before clustering the photos, a situation-based clustering hint generation unit generating a situation-based clustering hint to help photo clustering, and an importance generation unit generating importances of information to be used in photo clustering.
The photo sort items of the sort key generation unit may include at least one of a file name, a photographing time, and a photo file creation time. In addition, the photographing time may include photographing date information and the photo file creation time includes photo file creation date information.
The situation-based clustering hint of the situation-based clustering hint unit may include at least one of entire brightness information of the photo (Brightness), complexity information of the photo (Level of detail), homogeneous texture information of the photo (Homogeneous texture), edge information of the photo (Heterogeneous texture), information on whether the photo is monochrome (Monochromatic), information indicating a degree of colorfulness of a color expression of the photo (colorfulness), information indicating an entire color coherence shown in the photo (color coherence), information indicating a color temperature of a color of the photo (color temperature), information indicating whether a photo file of the photo includes taken time information (Taken time), information indicating that, if the photo and another photo are taken by different cameras in similar time bands and are clustered together, time information of the photo overlaps time information of the other photo and an importances of corresponding time information is lowered when the photo is situation-based clustered (Time overlap), information indicating whether voice information of a user is stored together with the photo when the photo was taken and is included with the photo as an audio clip file (Audio clip), and information indicating voice words and sentence strings recognized in an audio file of the photo (Speech recognition).
In addition, the importances of the importance generation unit may be based on at least one of information (taken time) setting an importance of time information on a time when the photo is taken, and information (low-level feature) setting an importance of content-based feature value information of the photo.
The information (low-level feature) setting the importance of content-based feature value information of the photo may include information setting an importance of a moving picture experts group (MPEG)-7 Visual Descriptor, and information setting an importance of a MPEG-7 Audio Descriptor.
The albuming tool may include a situation-based photo clustering tool clustering digital photo data based on situations. Further, the situation-based photo clustering tool may includes, a photo sort unit sorting photos, desired to be situation-based clustered, in order of time, a time feature value obtaining unit obtaining, from the photo description information generation unit, time feature values from each of two contiguous photos among the sorted photos, a content-based feature value extraction unit extracting, from the photo description information generation unit, content-based feature values from each of the two contiguous photos, a dissimilarity measuring unit measuring dissimilarity between the two photos by making time feature value importances, obtained from the albuming tool description information generation unit, reflect respective time feature values obtained from the time feature value obtaining unit, and by making predetermined content-based feature value importances, obtained from the albuming tool description information generation unit, reflect respective content-based feature values extracted in the content-based feature value extraction unit, and a situation change detection unit detecting a situation change by determining the situation change based on an amount of the dissimilarity value.
The respective time feature value importances and the respective predetermined content-based feature value importances may be determined by referring to situation-based clustering hints of the albuming tool description information generation unit.
In addition, the photo group description information of the photo group information generation unit may include at least one of situation-based photo groups by clustering situation-based photos, and a situation-based photo group includes a situation identifier identifying a situation, a series of photos formed with a plurality of photos determined by photo identifiers, and a photo key identifier allowing identifying of one or more representative photos among photos in a photo group.
To achieve the above and/or other aspects and advantages, embodiments of the present invention set forth a method for detecting a situation change in digital photos, including sorting photos, desired to be situation-based clustered, in order of time, obtaining respective time feature values and respective predetermined content-based feature values from each of two contiguous photos among the sorted photos, measuring a dissimilarity between the two photos by making predetermined time feature value importances reflect respective time feature values, and by making predetermined content-based feature value importances reflect respective content-based feature values, and detecting a situation change by determining the situation change if an amount of the dissimilarity is equal to or greater than a predetermined threshold.
The detecting of the situation change may include determining the situation change if an amount of change, between a dissimilarity between one of the contiguous photos and a previous photo, not same as another one of the contiguous photos, and a dissimilarity between the other one of the contiguous photos and a subsequent photo, is greater than a threshold.
A predetermined content-based feature value can be generated by using pixel information of respective photos, and include a visual descriptor including color, texture, and shape feature values, and/or an audio descriptor including a voice feature value.
In addition, time feature value importance and a content-based feature value importance can be determined by referring to a situation-based clustering hint including at least one of entire brightness information of a photo (Brightness), a complexity information of the photo (Level of detail), homogeneous texture information of the photo (Homogeneous texture), edge information of the photo (Heterogeneous texture), information on whether the photo is monochrome (Monochromatic), information indicating a degree of colorfulness of a color expression of the photo (colorfulness), information indicating entire color coherence shown in the photo (color coherence), information indicating a color temperature of a color of the photo (color temperature), information indicating whether a photo file of the photo includes taken time information (Taken time), information indicating that, if the photo and another photo taken by different cameras in similar time bands and are clustered together, time information of the photo overlaps time information of the other photo and an importance of corresponding time information is lowered when the photo is situation-based clustered (Time overlap), information indicating whether voice information of a user is stored together with the photo when the photo was taken and is included as an audio clip file (Audio clip), and information indicating voice words and sentence strings recognized in an audio file of the photo (Speech recognition).
To achieve the above and/or other aspects and advantages, embodiments of the present invention set forth a method for situation-based clustering of a digital photo album, including generating photo description information by extracting at least one of camera information on a camera taking a photo, photographing information of the photo, and a content-based feature value of the photo, generating albuming tool description information including a predetermined parameter for situation-based clustering of digital photos, performing photo albuming through situation-based clustering by using at least the photo description information and the albuming tool description information, generating photo group description information by using a result of the situation-based clustering, and generating predetermined photo albuming information by using the photo description information and the photo group description information to situation-based cluster the digital photo album.
In the generating of the photo description information, among the photo identifier, information on an author of the photo, photo file information, camera information, photographing information, and content-based feature values, the photo description information may include at least the photo identifier, with the content-based feature value being generated by using pixel information of the photo, and include a visual descriptor including color, texture, and shape feature values, and/or an audio descriptor including a voice feature value. In addition, in the generating of the albuming tool description information, the albuming tool description information generation may include at least one of sort key sorting photos before clustering of the photos, situation-based clustering hint generating a situation-based clustering hint to help photo clustering, and importance generating importances of information to be used in photo clustering. A sort key in the sort key sorting includes at least one of a file name, a photographing time, and a photo file creation time.
The situation-based clustering hint may include at least one of entire brightness information of the photo (Brightness), complexity information of the photo (Level of detail), homogeneous texture information of the photo (Homogeneous texture), edge information of the photo (Heterogeneous texture), information on whether the photo is monochrome (Monochromatic), information indicating a degree of colorfulness of a color expression of the photo (colorfulness), information indicating an entire color coherence shown in the photo (color coherence), information indicating a color temperature of a color of the photo (color temperature), information indicating whether a photo file of the photo includes taken time information (Taken time), information indicating that, if the photo and another photo taken by different cameras in similar time bands and are clustered together, time information of the photo overlaps time information of the other photo and an importance of corresponding time information is lowered when the photo is situation-based clustered (Time overlap), information indicating whether voice information of a user stored together with the photo when the photo was taken and is included with the photo as an audio clip file (Audio clip), and information indicating voice words and sentence strings recognized in an audio file of the photo (Speech recognition).
The importances may be based on at least one of information (taken time) setting an importance of time information on a time when the photo is taken, and information (low-level feature) setting an importance of content-based feature value information of the photo. Further, the information (low-level feature) setting the importance of content-based feature value information of the photo may include information setting an importance of a MPEG-7 Visual Descriptor, and information setting an importance of a MPEG-7 Audio Descriptor.
The performing of the photo albuming may include sorting photos, desired to be situation-based clustered, in order of time, obtaining time feature values and predetermined content-based feature values from each of two contiguous photos among the sorted photos, measuring a dissimilarity between the two photos by making predetermined time feature value importances reflect respective time feature values, and by making predetermined content-based feature value importances reflect respective content-based feature values, and detecting a situation change by determining the situation change based on an amount of the dissimilarity value. The time feature value importances and the content-based feature value importances may be determined by referring to situation-based clustering hints.
The generating of the predetermined photo albuming information, the photo group description information may include at least one of situation-based photo groups by clustering situation-based photos, and with the situation-based photo group including a situation identifier identifying a situation, a series of photos formed with a plurality of photos determined by photo identifiers, and a photo key identifier allowing identifying of one or more representative photos among the photos in a photo group.
To achieve the above and/or other aspects and advantages, embodiments of the present invention may implemented through computer readable instructions on a medium.
Additional aspects and/or advantages of the invention will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the invention.
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below to explain the present invention by referring to the figures.
Referring to
The photo input unit 100 can receive an input of a series of photos from an internal memory apparatus of a digital camera or a portable memory apparatus, for example.
The photo description information generation unit 110 generates photo description information describing a photo and including at least a photo descriptor.
More specifically, the photo description information generation unit 110 confirms from each of input photos whether or not there is camera information and photographing information stored in the respective photo files, and if the information items are in any of the photo files, the information items are extracted and expressed according to a photo description structure. At the same time, content-based feature values are extracted from pixel information of photos and expressed according to the photo description structure. The photo description information is then input to the photo albuming tool 130 for grouping photos.
In order to more efficiently retrieve and group photos by using the variety of generated photo description information items, the albuming tool description information generation unit 120 generates albuming tool description information including predetermined parameters for situation-based photo clustering.
The sort key generation unit 200 generates an item for sorting photos before the photos are clustered, and preferably, the photo sort items include at least one of a file name, a photographing date and/or time, and a photo file generation date and/or time. The situation-based clustering hint generation unit 220 generates a situation-based clustering hint to help photo clustering. The importance generation unit 240 generates the importance of information to be used for photo clustering.
The albuming tool 130 performs photo albuming through situation-based photo clustering by using at least the photo description information and the albuming tool description information, and includes a situation-based clustering tool 135. The situation-based clustering tool 135, an apparatus for detecting a situation change in a digital photo, determines and detects a situation change in a digital photo.
The situation-based clustering tool 135 clusters digital photo data based on situations, and may include a photo sort unit 300, a time feature value obtaining unit 320 (where the time feature can be based on time information and/or date information), a content-based feature value extraction unit 340, a dissimilarity measuring unit 360, and a situation change detection unit 380, as shown in
The photo sort unit 300 sorts photos desired to be situation-based clustered, in order of taken time (with “time” being representative of time and/or date). The time feature value obtaining unit 320 obtains, from the photo description information generation unit 110, a time feature value for each of two contiguous photos among photos sorted in order of taken time. The content-based feature value extraction unit 340 extracts from the photo description information generation unit 110 a content-based feature value for each of the two contiguous photos. The dissimilarity measuring unit 360 measures the dissimilarity of the two photos by making the time feature value importance, obtained from the albuming tool description information generation unit 120, reflect the time feature value obtained from the time feature value obtaining unit 320, and by making the content-based feature value importance, obtained from the albuming tool description information generation unit 120, reflect the content-based feature value extracted from the content-based feature value extraction unit 340. The situation change detection unit 380 determines and detects a situation change by using the amount of change in the dissimilarity value.
The photo group information generation unit 140 generates predetermined photo group description information from the output of the albuming tool 130. The photo albuming information generation unit 150 generates predetermined photo albuming information by using the photo description information and the photo group description information.
As detailed items to express the file information 440 stored in a photo file, the photo file information 440 may also include an item (File name) 442 expressing the name of the photo file, an item (File format) 444 expressing the format of the photo file, an item (File size) 446 expressing the capacity of the photo file in units of bytes, and an item (File creation date/time) 448 expressing the date and/or time (i.e., time information) when the photo file was created.
As detailed items to express the camera and photographing information 460 stored in a photo file, the camera and photographing information 460 may also include an item (IsExifInformation) 462 expressing whether or not a photo file includes Exif information, an item (Camera model) 464 expressing a camera model taking the photo, an item (Taken date/time) 466 expressing the date and/or time when the photo was taken, an item (GPS information) 468 expressing the location where the photo was taken, an item (Image width) 470 expressing the width information of the photo, an item (Image height) 472 expressing the height information of the photo, an item (Flash on/off) 474 expressing whether or not a camera flash is used to take the photo, an item (Brightness) 476 expressing the brightness information of the photo, an item (Contrast) 478 expressing the contrast information of the photo, and an item (Sharpness) 479 expressing the sharpness information of the photo.
Also, the information 480 expressing a content-based feature value extracted from a photo may include an item (Visual descriptor) 482 expressing feature values of color, texture, and shape extracted by using an MPEG-7 Visual Descriptor, and an item (Audio descriptor) 484 expressing a feature value of voice extracted by using the MPEG-7 Audio Descriptor.
In addition, in order to achieve a higher situation-based clustering performance, situation-based clustering hint information is defined, and according to the hint of each photo, the importance of feature information to be used in photo clustering can be adaptively set. As shown in
The item 500 sorting photos may include an item (File name) 502 sorting photos in order of name, an item (Taken date/time) 504 sorting photos in order of their respective taken date and/or time, an item (File creation date/time) 506 sorting photos in order of taken date and/or time.
Detailed items of the clustering hint item 520 expressing semantic information of a higher level concept of a photo may include an item (Brightness) 522 indicating information on the entire brightness of a photo, an item (Level of detail) 524 indicating the degree of complexity of the photo, an item (Homogeneous texture) 526 indicating information on homogeneous texture of the photo, an item (Heterogeneous texture) 528 indicating information on an edge of a photo, an item (Monochromic) 530 indicating whether or not the photo is monochrome, an item (Colorfulness) 532 indicating the degree of colorfulness of the color expression of the photo, an item (Color coherence) 534 indicating the entire color coherence shown in the photo, an item (Color temperature) 536 indicating the color temperature of the color of the photo, an item (Taken time) 538 indicating whether or not the photo file includes taken time information, an item (Time overlap) 540 indicating that, if photos taken by many cameras in similar time bands are clustered at the same time, the time information of a current photo overlaps the time information of photos taken by other cameras and that the importance of time information is lowered when the current photo is situation-based clustered, an item (Audio clip) 542 indicating whether or not voice information of a user is stored together with the photo when the photo is taken, e.g., included as an audio clip file, and an item (Recognized speech) 544 indicating voice words and sentence strings recognized in an audio file of the photo.
The value of the item (Brightness) 522 indicating the brightness of the entire photo can be measured by averaging the pixel intensity extracted from each pixel of a photo, and the value of the item (Level of detail) 524 indicating the degree of complexity of the photo can be estimated from an entropy measured from the pixel information of the photo or ‘an isopreference curve’ determining the actual complexity of each photo. The value of the item (Homogeneous texture) 526 indicating information on homogeneous texture of the photo can be measured by using regularity, direction, and scale of the texture from the feature value of Texture Browsing descriptor among MPEG-7 visual descriptors. The value of the item (Heterogeneous texture) 528 indicating information on an edge of a photo can be measured by extracting edge information from a photo and normalizing the intensity of the extracted edge. The value of the item (Monochromic) 530 having no color information, and indicating whether or not the photo is monochrome can be determined by the number of bits allocated to each pixel of the photo. The value of the item (Colorfulness) 532 indicating the degree of colorfulness of the color expression of the photo can be measured by normalizing the height of the histogram of each color value from a color histogram and the distribution value of the entire color value. The value of the item (Color coherence) 534 indicating the entire color coherence shown in the photo can be measured by using a Dominant Color descriptor among MPEG-7 visual descriptors, and can be measured by normalizing the height of the histogram of each color value from a color histogram and the distribution value of the entire color value. The value of the item (Color temperature) 536 indicating the color temperature of the color of the photo can be measured by normalizing a color temperature value measured by using a Color Temperature descriptor among MPEG-7 visual descriptors. The item (Taken time) 538 indicating whether or not the photo file includes taken time information can be extracted from Exif information of the photo file. As for the item (Time overlap) 540 indicating that, if photos taken by many cameras in similar time bands are clustered at the same time, the time information of a current photo overlaps the time information of photos taken by other cameras and the importance of time information being lowered when the current photo is situation-based clustered, information on whether or not times of camera photos are overlapping can be obtained by placing a sliding window with an arbitrary length centered at the current photo and comparing camera model information of photos belonging to the window. The item (Audio clip) 542, indicating whether or not voice information of a user stored together with a photo when the photo is taken is included as an audio clip file, can be obtained by examining whether or not there is a file having the same file name as that of the photo and a different extension indicating a voice file such as wav, and mp2/3, etc. As for the item (Recognized speech) 544 indicating voice words and sentence strings recognized in an audio file of a photo, a recognized voice can be obtained by using methods such as hidden Markov model (HMM), neural network, dynamic time warping (DTW) for a voice feature value extracted by using LPC cepstrum, PLP cepstrum, filter bank energy, mel frequency Cepstral coefficient (MFCC) and so on. Though this method is a preferred embodiment of a method for obtaining hint information, other methods can also be used.
Detailed items of the item 560 expressing the importance of information to be used in photo clustering include an item (Taken time) 562 setting the importance of taken time information and an item (Low-level feature) 566 setting the importance of information on a content-based feature value of a photo.
The item (Taken time) 562 setting the importance of taken time information includes an item (Importance value) 564 expressing a corresponding importance value. The item (Low-level feature) 566 setting the importance of information on a content-based feature value of a photo includes an item (Visual descriptor) 568 setting the importance of MPEG-7 Visual Descriptor and an item (Importance value) 570 expressing a corresponding importance value, and an item (Audio descriptor) 572 setting the importance of MPEG-7 Audio Descriptor and an item (Importance value) 574 expressing a corresponding importance value. The importance value can have a value in a range from 0.0 to 1.0, for example.
Also, each situation group may have a situation identifier (Situation ID) 6200. One or more representative photos (Key photo ID) 6300 among photos in the group can be set by the photo identifier.
The description structure expressing parameters required for effective photo clustering can be expressed in an XML format as the following, as an example.
This document contains visual tools defined in ISO/IEC 15938-3
Meanwhile,
An apparatus, medium, and method for situation-based clustering of digital photos can use the description information described above and effectively perform digital photo albuming of digital photo data. Accordingly, first, if a digital photo is input through a photo input unit 100, in operation 1100, photo description information describing the photo and including at least a photo identifier can be generated, in operation 1110.
Also, albuming tool description information including a predetermined parameter for digital photo clustering can be generated, in operation 1120. Then, the photo is situation-based clustered by using the photo description information and the albuming tool description information, in operation 1130. The result of the situation-based clustering is generated as predetermined photo group description information, in operation 1140. Predetermined photo albuming information is then generated by using the photo description information and the photo group description information, in operation 1150.
Preferably, the albuming tool description information, in operation 1120, includes at least one of a sort key sorting photos before clustering digital photos, a situation-based clustering hint generating a situation-based clustering hint to help clustering, and an importance generating the importance of information to be used in photo clustering, as shown in
The importance includes at least one of information (taken time) setting the importance of taken time information and information (low-level feature) setting the importance of information on a content-based feature value of a photo. The information (low-level feature) setting the importance of information on a content-based feature value of a photo includes information setting the importance of MPEG-7 Visual Descriptor and information setting the importance of an MPEG-7 Audio Descriptor.
Embodiments of the present invention provide a method and medium for more quickly and effectively albuming digital photos with a large amount of digital photo data by using the information described above, and includes a method and medium for automatically clustering digital photo data based on situations of taken photos.
In
First, when different N types of content-based feature values are extracted from the i-th photo, the content-based feature values of the i-th photo can be expressed as the following equation 1:
Fcontent(i)={F1(i),F2(i),F3(i), . . . ,FN(i)} (1)
Here, Fk(i), extracted from the i-th photo, indicates each feature value vector that is color, texture, or shape feature value.
The time feature value of the i-th photo is extracted to units of seconds, and can be expressed as the following equation 2:
Ftime(i)={fyear,fmonth,fday,fhour,fminute,fsecond} (2)
Here, fyear, fmonth, fday, fhour, fminute, and fsecond denote year, month, day, hour, minute, and second, respectively, of a time when a photo is taken.
In embodiments of the present invention, in order to achieve a higher clustering performance, as described above, semantic information of a higher level concept included in a photo is expressed as situation-based clustering hint information and according to the hint of each photo, the importance of a feature value to be used for photo clustering can be adaptively set. The importance of each content-based feature value can be determined according to a given situation-based clustering hint and can be expressed as the following equation 3:
Vcontent(i)={v1(i),v2(i),v3(i), . . . ,vN(i)} (3)
Here, vk(i) denotes the importance of feature value Fk(i), can have a value in a range from 0.0 to 1.0, for example, and according to a give situation-based clustering hint, can be expressed as the following equation 4:
vk(i)=functionk(situation-based clustering hint) (4)
Here, functionk(•) denotes the importance measurement function of feature value Fk(i), and has a function value with a situation-based clustering hint as a variable. A measurement function according to the type of a feature value is used.
Also, a value obtained by adding a content-based feature value importance and a time feature value importance can be made to be 1.0, for example. Accordingly, the importance of a time when a photo is taken can be set to a value satisfying the following equation 5:
The content-based feature value and time feature value reflect the thus determined feature value importance can be expressed as the following equation 6:
F′content(i)={Fcontent(i),Vcontent(i)}={{F1(i),v1(i)},{F2(i),v2(i)},{F3(i),v3(i)}, . . . ,{FN(i),vN(i)},F′time(i)={Ftime(i),Vtime(i)} (6)
Next, in order to determine the dissimilarity of the i-th photo and the (i−1)-th photo, first, comparison of similarity of each feature value can be performed according to the following equations 7 and 8.
The comparison of similarity between time feature values can be performed according to the following equation 7:
Dtime(i)=Φ{F′time(i)−F′time(i−1)} (7)
Here, Φ is a function scaling a time difference to be more sensitive to a smaller time interval, and for this, a log function and the like can be used, for example. If time information is used without change, in case of a small time interval between two photos, the change in the difference value is insignificant and with the increasing time interval, the change in the difference value increases rapidly. Accordingly, scaling is needed.
The comparison of similarity between content-based feature values can be performed according to the following equation 8:
Dcontent(i)={F′content(i)−F′content(i−1)}={D1(i),D2(i),D3(i), . . . ,DN(i)} (8)
The final dissimilarity between the i-th photo and the (i−1)-th photo can be obtained as the following equation 9, according to the time importance and the content-based feature value importance:
Here, Dtime
Finally, whether or not a situation change occurs between the i-th photo and the (i−1)-th photo can be determined by using the dissimilarity value of the (i−1)-th photo and the (i−2)-th photo, the dissimilarity value of the i-th photo and the (i−1)-th photo, and the dissimilarity value of the (i+1)-th photo and the i-th photo, together.
Whether or not a situation change occurs between the i-th photo and the (i−1)-th photo can be determined by the amount of change between dissimilarity values of the neighboring photos.
As the example shown in
By applying this pattern, whether or not a situation change occurs between the i-th photo and the (i−1)-th photo can be determined by the following equation 10:
ΔDtotal(i)<β×Dtotal(i) subject to ΔDtotal(i−1)>0 and ΔDtotal(i+1)>0 (10)
Here, ΔDtotal(i)=Dtotal(i)−Dtotal(i−1)+Dtotal(i)−Dtotal(i+1), and β is a threshold value of a dissimilarity difference value to determine whether or not a situation change occur.
The method for detecting the occurrence of a situation change, described in the equation 10, cannot detect a situation cluster formed with one photo. The situation cluster formed with one photo has a pattern shown in
ΔD′total(i)<γ×Dtotal(i) subject to ΔDtotal(i−1)>0 and ΔDtotal(i+1)<0 (11)
Here, ΔD′total(i)=Dtotal(i)−Dtotal(i−1) and γ is a threshold value of a dissimilarity difference value to determine whether or not a situation change in one photo occur.
Embodiments of the present invention can also be embodied as computer readable code(s) (or instruction(s)) on a medium or media, e.g., computer readable recording media. The medium can be any data storage/transferring device that can store/transfer data which can be thereafter be read by a computer system. Examples of the media can include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, carrier waves, distributed networks, and the Internet, for example.
While embodiments of the present invention have been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. The described embodiments should be considered in descriptive sense only and not for purposes of limitation. Therefore, the scope of the invention is defined not by the detailed description of the invention but by the appended claims, and all differences within the scope will be construed as being included in the present invention.
According to the present invention as described above, a description structure effectively describing information that can be extracted from a photo and parameters appropriately performing the function for situation-based clustering of photos are defined and an effective description structure describing the parameters is suggested.
Also, in addition to information items that can be basically obtained from a photo such as camera information and file information stored in the photo, by using content-based feature value information that can be obtained from the content of a photo such as color, texture, and shape, situation-based photo clustering is performed.
By doing so, with a large number of photos, an album can be constructed conveniently and easily by using information described in relation to digital photos, and a large capacity of photo data can be used to quickly and effectively form an album.
Although a few embodiments of the present invention have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2004-0027578 | Apr 2004 | KR | national |
10-2005-0029960 | Apr 2005 | KR | national |