This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2015-061202 filed Mar. 24, 2015.
The present invention relates to a user-profile generating apparatus, a movie analyzing apparatus, a movie reproducing apparatus, and a non-transitory computer readable medium.
According to an aspect of the invention, there is provided a user-profile generating apparatus including a first generating unit and second generating unit. The first generating unit uses the degree of similarity among reference images to generate tree structure information describing a relationship among the reference images by using a tree structure. The degree of similarity is obtained from feature values of the reference images. The second generating unit uses feature values of target images owned by a user and the feature values of the reference images corresponding to leaf nodes in the tree structure to generate a user profile in which the degree of interest of the user is assigned to each node in the tree structure.
Exemplary embodiment of the present invention will be described in detail based on the following figures, wherein:
An exemplary embodiment of the present invention will be described in detail below with reference to the attached drawings.
A movie reproducing apparatus according to the exemplary embodiment will be described.
As illustrated in
An operation unit 24 operated by a user, a display 26 for displaying various types of information, and a communication unit 28 communicating with external apparatuses including an external server 30 are connected to the I/O interface 22.
The external server 30 is connected to the movie reproducing apparatus 10 via the communication unit 28. Many image files owned by multiple users are stored in the external server 30. These many image files are transmitted from multiple client terminals including the movie reproducing apparatus 10, to the external server 30.
For example, as illustrated in
Assume that a specific word is specified as a keyword by using a client terminal when images stored in the external server 30 are viewed on the client terminal. In this case, the external server 30 refers to the tag information 40B to extract image files corresponding to the keyword, and transmits the extracted image files to the client terminal. Thus, each user may view images which are owned by the user or owned by other users and which correspond to the desired keyword, on his/her client terminal.
Thus, many images owned by multiple users are stored in the external server 30. In the exemplary embodiment, feature values of multiple images owned by each user are analyzed, and a user profile reflecting the degree of interest of the user is generated. The generated user profile is applicable to various techniques in various fields.
The movie reproducing apparatus 10 according to the exemplary embodiment uses the degree of similarity among multiple images (reference images) which is obtained from feature values of the images, and generates tree structure information representing a relationship among multiple reference images by using the tree structure. The movie reproducing apparatus 10 uses feature values of multiple images (target images) owned by the user and reference images corresponding to the leaf nodes in the tree structure, and generates a user profile in which the degree of interest of the user is assigned to each node in the tree structure. Then, the movie reproducing apparatus 10 uses the generated user profile to reproduce a movie by using a reproducing method appropriate for each user.
The process flow executed when a user-profile generating process is performed by the CPU 14 of the movie reproducing apparatus 10 according to the exemplary embodiment will be described with reference to the flowchart illustrated in
In the exemplary embodiment, the program for the user-profile generating process is stored in advance in the nonvolatile memory 20, but the exemplary embodiment is not limiting. For example, the program for the user-profile generating process may be received via the communication unit 28 from an external apparatus and executed. The program for the user-profile generating process which is stored in a recording medium such as a compact disc-read-only memory (CD-ROM) may be read via the I/O interface 22 by using a CD-ROM drive or the like, whereby the user-profile generating process is performed.
For example, when an instruction to execute the program for the user-profile generating process is supplied by using the operation unit 24, the program is executed.
Alternatively, the program for the user-profile generating process may be executed at a timing at which an image file is transmitted to the external server 30.
In step S101, image information indicating multiple reference images is obtained. In the exemplary embodiment, a predetermined number (for example, 1000) of image files owned by multiple users are obtained from the external server 30. For example, as illustrated in
In step S103, the degree of similarity among the multiple reference images indicated by the obtained pieces of image information is calculated. In the exemplary embodiment, the degree of visual similarity obtained from visual features, the degree of semantic similarity obtained from semantic features, and the degree of social similarity obtained from the relationship between the owners of the multiple reference images and the reference image are calculated. Then, the calculated degrees of similarity are added, whereby the degree of similarity between reference images is calculated.
The degree of visual similarity VS(Ii, Ij) between the ith reference image Ii and the jth reference image Ij is obtained by using Expression (1) described below, where Xi represents feature values of the ith reference image, Xj represents feature values of the jth reference image, σ is the average of differences between the feature values of the ith reference image and the feature values of the jth reference image.
The degree of semantic similarity TS(Ii, Ij) between the reference image Ii and the reference image Ij is obtained as the average of the degrees of similarity Sim(ti, tj) obtained by using Expressions (2) and (3) described below. The parameters in Expressions (2) and (3) described below are obtained by applying, to WordNet, the words representing the photographed targets included in the tag information 40B of the reference image Ii and the reference image Ij.
WordNet is a known conceptual dictionary (semantic dictionary), and is a conceptual dictionary database using tree structure information having a hierarchical structure in which many words are defined in a relationship expressed with higher and lower levels. The expression lso(ti, tj) in Expression (2) described below represents a word serving as a parent node of both of a word ti and a word tj in WordNet; hypo(t) represents the number of child nodes of a word t; and deep(t) represents the number of hierarchies for the word t. The symbol nodemax represents the maximum number of nodes in WordNet, deepmax represents the maximum number of node hierarchies, and k is a constant.
In the exemplary embodiment, the case in which WordNet is used to calculate the degree of semantic similarity TS(Ii, Ij) is described, but the exemplary embodiment is not limiting. Any dictionary database using tree structure information having a hierarchical structure in which many words are defined in a relationship expressed with higher and lower levels may be used to calculate the degree of semantic similarity TS (Ii, Ij).
The degree of social similarity SS(Ii, Ij) between the reference image Ii and the reference image Ij is obtained by using Expressions (4) to (6) described below, where ei represents a unit vector; W represents a matrix representing a relationship between users and reference images; and c represents a constant.
The degree of similarity between the reference image Ii and the reference image Ij which is used in the exemplary embodiment is obtained by using Expression (7) described below, where α, β, and γ (α+β+γ=1) are constants.
Sim(Ii,Ij)=αVS(Ii,Ij)+βTS(Ii,Ij)+γSS(Ii,Ij) (7)
The constants α, β, and γ in Expression (7) are determined by using the value of normalized discounted cumulated gain (nDCG) which is an index whose value is set larger as more correct ranking is assigned to targets to be evaluated. For example,
An nDCG value is obtained by using Expressions (8) and (9) described below, where k represents the maximum number of targets to be subjected to ranking, reli represents the degree of similarity of the target at position i in ranking, and idealDCG represents the maximum value of DCG.
In step S105, the calculated degree of similarity among the multiple reference images is used to perform hierarchical cluster analysis. As a method for performing cluster analysis, a known technique, such as the nearest neighbor method, the furthest neighbor method, the group average method, and the Ward's method, may be used.
In step S107, in accordance with the result of the hierarchical cluster analysis, tree structure information representing the relationship among the reference images is generated by using each reference image as a leaf node. For example, as illustrated in
The cat-B leaf node n7 is associated with a reference image 42I; and the cat-A leaf node n6, with a reference image 42J. The dog leaf node n4 is associated with a reference image 42K; the Eiffel-Tower leaf node n8, with a reference image 42L; and the town leaf node n9, with a reference image 42M.
In step S109, image information indicating multiple target images is obtained. In the exemplary embodiment, a predetermined number (for example, 100) of image files owned by a user to be analyzed are obtained from the external server 30. For example, as illustrated in
In step S111, in the tree structure information illustrated in
In step S113, the degree of interest of the user is assigned to each node. First, the degree of interest of the user is assigned to each leaf node. The degree of interest is set, for example, at a ratio of the number of target images associated with the reference image for the leaf node, to the number of all of the target images. That is, the larger the number of target images associated with a leaf node is, the higher the degree of interest is. Then, the degree of interest is assigned to the parent node which is a higher node directly connected to a leaf node. Specifically, for example, a value obtained by adding all of the degrees of interest which are assigned to all of the leaf nodes to which a parent node branches off is assigned as the degree of interest for the parent node. By repeating this assignment of the degree of interest until the root node, the degree of interest is assigned to each node.
For example, as illustrated in
In this case, 0.15 is assigned to the dog leaf node n4 as the degree of interest; 0.15, to the Eiffel-Tower leaf node n8; and 0.1, to the town leaf node n9. To the cat node n5 having the cat-A leaf node n6 and the cat-B leaf node n7 as descendant nodes, 0.6 which is obtained by adding the degree of interest, 0.4, of the cat-A leaf node n6 to the degree of interest, 0.2, of the cat-B leaf node n7 is assigned as the degree of interest. To the animal node n2 having the dog leaf node n4 and the cat node n5 as descendant nodes, 0.75 obtained by adding the degree of interest, 0.15, of the dog node n4 to the degree of interest, 0.6, of the cat node n5 is assigned as the degree of interest.
To the landscape node n3 having the Eiffel-Tower leaf node n8 and the town leaf node n9 as descendant nodes, 0.25 obtained by adding the degree of interest, 0.15, of the Eiffel-Tower leaf node n8 to the degree of interest, 0.1, of the town leaf node n9 is assigned as the degree of interest. To the root node n1 having the animal node n2 and the landscape node n3 as child nodes, 1 obtained by adding the degree of interest, 0.75, of the animal node n2 to the degree of interest, 0.25, of the landscape node n3 is assigned as the degree of interest. Thus, it is found that the user owning these target images has a high degree of interest for cats because the degree of interest for the cat node n5 is higher than that for dog leaf node n4 and that for the landscape node n3.
In step S115, the tree structure information in which the degree of interest is assigned to each node is stored as a user profile in the nonvolatile memory 20, and the execution of the program for the user-profile generating process is ended.
This user-profile generating process is performed for each user, and a user profile for the user is generated and stored, whereby each user profile is used in various situations. In the exemplary embodiment, a case in which the generated user profile is used when a movie reproducing process suitable for the user is performed will be described.
The process flow for a movie reproducing process performed when the CPU 14 of the movie reproducing apparatus 10 according to the exemplary embodiment receives an instruction to execute the process, through the operation unit 24 will be described with reference to the flowchart illustrated in
In the exemplary embodiment, the program for the movie reproducing process is stored in advance in the nonvolatile memory 20, but the exemplary embodiment is not limiting. For example, the program for the movie reproducing process may be received via the communication unit 28 from an external apparatus and stored in the nonvolatile memory 20. Alternatively, the program for the movie reproducing process which is stored in a recording medium such as a CD-ROM may be read via the I/O interface 22 by using a CD-ROM drive or the like, whereby the movie reproducing process is performed.
In step S201, image information indicating a movie to be reproduced is obtained. In the exemplary embodiment, image information stored in the external server 30 is obtained via the communication unit 28.
In step S203, movie sections obtained by dividing the movie indicated by the obtained image information, in accordance with multiple time zones are generated. As the method for dividing a movie, a known technique, such as a method in which a movie is divided at every predetermined time, or a method in which scene switching is extracted from a change in feature values of each of the frames constituting a movie and in which the movie is divided into scenes, may be used.
In step S205, feature values of each movie section is extracted. First, a similarity score S1 obtained from the degree of similarity between the user profile and a movie section is calculated. Specifically, the frames included in the movie section are used as target images to perform the user-profile generating process illustrated in
The degree of similarity using the cosine similarity takes a value from 0 to 1. A value closer to 1 indicates a higher degree of similarity. The degree of similarity cos(x, y) is obtained by using Expression (10) described below, where a vector V represents the degrees of interest which are set to nodes in the tree structure information, xi represents the ith degree of interest in the vector V in the user profile, and yi represents the ith degree of interest in the vector V in the movie section.
Then, a saliency score S2 obtained by using a saliency map of an image is calculated. The saliency map of an image is obtained by calculating a visual saliency for each pixel in the image. In the exemplary embodiment, for example, a known method is used to generate a saliency map for each frame included in a movie section. The average of corresponding pixels of the generated saliency maps is calculated, whereby the saliency map for the movie section is generated. In the exemplary embodiment, the saliency score S2 is directly obtained from the saliency maps for the frames. In this process, as described below, a Gaussian kernel is applied to each pixel in a saliency map so that noise is reduced.
The saliency score S2 is expressed with two elements on the basis of the generated saliency maps of a movie section. A first element is a weighted sum of the pixels in a saliency map. The weighted sum Sum(smap, Q) is obtained by using Expression (11) described below. In Expression (11), smap(i, j) represents a pixel value at coordinates (i, j) in the saliency map before the Gaussian kernel is applied, and Q(i, j) represents the Gaussian kernel which is set on the saliency map.
A second element is based on the fact that a human being tends to focus on the center of an image. The information amount DKL representing the difference between an ideal distribution P representing an ideal distribution at the center of the image by using the normal distribution and the Gaussian kernel Q which is set on the saliency map is obtained by using Expression (12) and by using the Kullback-Leibler divergence (KLD) which is a known calculation method. In Expression (12), p(u) represents the distribution density of the ideal distribution P, and q(u) represents the distribution density of the Gaussian kernel Q.
By using the weighted sum Sum(smap, Q) of a saliency map which is the first element, and the information amount DKL which represents the difference between the ideal distribution P and the Gaussian kernel Q and which is the second element, the saliency score AS(F) for a frame F is obtained by using Expression (13) described below. In Expression (13), smapf represents a pixel value smap(i, j) in the frame F, and Qf represents the Gaussian kernel Q for the frame F.
In the exemplary embodiment, the saliency score S2 is defined as the average ASi of the saliency scores AS(F) for the frames F. The average ASi is obtained by using Expression (14) described below, where the ith frame is represented by Fi.
In step S207, the degree of similarity S between the user profile and each movie section is calculated. In the exemplary embodiment, a value obtained by adding the above-described similarity score S1 and the above-described saliency score S2 together is calculated as the degree of similarity S between the user profile and the movie section. The similarity score S1 and the saliency score S2 may be added together after at least one of the similarity score S1 and the saliency score S2 is weighted.
In step S209, in accordance with the degree of similarity S between the user profile and each movie section, reproduction priority for the movie section is set. In the exemplary embodiment, the reproduction priority for each movie section is set so that the reproduction priority is set higher when the degree of similarity S between the user profile and the movie section is higher.
For example, assume that a user owns many bridge images, such as an image 44G which is obtained by photographing a bridge and which is illustrated in
In step S211, the reproduction priority for each movie section is adjusted on the basis of the reproduction priorities for the adjacent movie sections before and after the movie section. In the exemplary embodiment, a first difference between the reproduction priority for the target movie section which is a movie section to be adjusted and the reproduction priority for the adjacent movie section before the target movie section is calculated. In addition, a second difference between the reproduction priority for the target movie section and the reproduction priority for the adjacent movie section after the target movie section is calculated. Then, when both of the first difference and the second difference are equal to or more than a predetermined threshold, the reproduction priority for the target movie section is adjusted so that the first difference and the second difference are less than the threshold. The above-described predetermined threshold is one for determining whether or not the movie to be reproduced is smoothly reproduced. For example, the threshold is a value that is smaller than the difference between the maximum and the minimum of the reproduction priority which is set to the movie sections, and that is larger than the half of the difference. In the adjustment, one of the reproduction priorities for the adjacent movie sections before and after the target movie section may be set to the reproduction priority for the target movie section. Alternatively, the average of the reproduction priorities for the adjacent movie sections before and after the target movie section may be set to the reproduction priority for the target movie section.
For example, in the case where “high” or “low” is set as the reproduction priority for each movie section, when a movie section has a high reproduction priority and both of the just before and just after movie sections have a low reproduction priority, reproduction priority is adjusted so that the reproduction priority for the movie section is set low. When a movie section has a low reproduction priority and both of the just before and just after movie sections has a high the reproduction priority, reproduction priority is adjusted so that the reproduction priority of the movie section is set high.
For example, as illustrated in
In step S213, the display 26 is controlled so that a reproduction screen for the movie is displayed. For example, as illustrated in
In the exemplary embodiment, when a user uses the operation unit 24 to input an instruction to change the reproduction priority for any movie section, the reproduction priority for the specified movie section is changed. For example, as illustrated in
In step S215, whether or not an instruction to reproduce the movie is supplied is determined. In the exemplary embodiment, when a reproduction instruction is input by using the operation unit 24, it is determined that an instruction to reproduce the movie is supplied.
If it is determined that an instruction to reproduce the movie is supplied in step S215 (YES in step S215), the process proceeds to step S217. If it is determined that an instruction to reproduce the movie is not supplied in step S215 (NO in step S215), the process in step S215 is repeatedly performed until it is determined that an instruction to reproduce the movie is supplied.
In step S217, the movie is reproduced. In the exemplary embodiment, in the process of reproducing the movie, a movie section having a high reproduction priority is reproduced at a normal reproduction speed, and a movie section having a low reproduction priority is reproduced at a reproduction speed faster than the normal reproduction speed. Thus, a movie section presumed not to be video which the user does not like is automatically fast-forwarded.
In the exemplary embodiment, the case in which the reproduction speed of each movie section is changed in accordance with the reproduction priority for the movie section is described, but the exemplary embodiment is not limiting. For example, only movie sections having a high reproduction priority may be reproduced. Thus, only movie sections presumed to be video which the user likes are automatically selected and reproduced.
The related art method A is a method in which the degree of similarity between each frame included in a movie section and the target image is calculated by using pattern matching or the like, and in which the reproduction priority for each movie section is set on the basis of the calculated degrees of similarity. For example, as illustrated in
In the exemplary embodiment, the case in which the generated user profile is applied to a movie reproducing process is described, but the exemplary embodiment is not limiting. For example, the generated user profile is applied to various techniques in various fields, such as multimedia, recommendation for image search, personalized video summarization, artificial intelligence, human computer interaction, and compulsory computing.
In the exemplary embodiment, the case in which the movie reproducing apparatus 10 performs the user-profile generating process illustrated in
The foregoing description of the exemplary embodiment of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiment was chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2015-061202 | Mar 2015 | JP | national |