A scene information extraction method, scene extraction method and scene extraction apparatus according to an embodiment of the invention will be described in detail with reference to the accompanying drawings.
Firstly, the outline of the embodiment will be described.
Users are now communicating with each other by attaching comment information to the time-sequence data of video content through a bulletin board function or chat function. In the embodiment, a zone coherent in meaning is extracted from video content, based on words included in comment information related to the time-sequence data of video content, thereby anticipating scene information of the content and realizing addition of meta-data.
Since comment information reflects how users felt when they viewed certain video content, a zone coherent in meaning can be extracted from the video content, based on the comment information. Further, comment information corresponds to the upsurges of conversation that were not expected by content providers when they provided video content. Namely, the comment information enables users to accelerate their communications through the content. Also, comment information can reflect the thoughts and ideas of users and hence change at all times. For instance, if the number of comments “That's interesting” is increased concerning a certain video content zone labeled “Cool” at a previous time, the label of the video content can be changed to “Interesting”. Thus, the embodiment can follow dynamic changes in scene information caused by changes in the thoughts of users.
The scene information extraction method, scene extraction method and scene extraction apparatus according to the embodiment can accurately extract scene information and scenes.
Referring to
The scene information extraction apparatus extracts a zone coherent in meaning from video content, based on comment information related to the time-series data of the video content. Scene information contained in the video content is anticipated from words included in the related comment information, and addition of meta-data is realized.
As shown, the scene information extraction apparatus comprises a comment information database (DB) 101, comment information acquisition unit 102, morpheme analysis unit 103, morpheme database (DB) 104, computation unit 105, user database (DB) 106, estimated-word-value assignment unit 107, scene information extraction unit 108 and scene information database (DB) 109. The computation unit 105 includes a comment-character-string-length computation unit 110, comment-word-number computation unit 111, return-comment determination unit 112, return-comment-number computation unit 113, word-value computation unit (estimated-word-value acquisition unit) 114 and user search unit 115. The scene information extraction unit 108 includes an estimated-value-distribution normalization unit 116 and estimated-value-distribution change rate computation unit 117.
The comment information database 101 stores comment information. The comment information is formed of, for example, meta-data and a comment. The meta-data includes a comment identifier, parent comment identifier, user identifier, comment posting time, content identifier, start time and end time. The comment information will be described later with reference to
The comment information acquisition unit 102 acquires comment information items one by one from the comment information database 101. Specifically, the comment information acquisition unit 102 acquires comment information in units of, for example, comment identifiers, and transfers it to the morpheme analysis unit 103 in units of, for example, comment identifiers.
The morpheme analysis unit 103 subjects the comment included in the acquired comment information to morpheme analysis, and acquires words and the articles of the words from the comment information in units of, for example, comment identifiers. The morpheme analysis unit 103 outputs a table showing the correspondence between each word, its article and a comment identifier (or comment identifies) that indicates the corresponding comment, as shown in
The morpheme database 104 computes the estimated value of each word. The estimated value of each word is used to extract an important word for extracting scene information. The more important the word is, the higher estimated value the word should have. The morpheme database 104 stores words, the part of speech of each word, the frequency of occurrence of each word, and the estimated value of each word. The morpheme database 104 will be described later in detail with reference to
The computation unit 105 computes the estimated value of each word utilizing the correspondence table output from the morpheme analysis unit 103. The specific computation method of the computation unit 105 will be described later with reference to
The user database 106 stores the estimated value of each user that indicates whether the comments of each user are important to scene information extraction. The user database 106 also stores, for example, user identifiers, user names and the number of statements of each user. Particulars concerning the user database 106 will be described later with reference to
Whenever video content related to the comments of a user, and the zone(s) of the video content related to the comments are acquired, the estimated-word-value assignment unit 107 assigns, to the acquired zone(s), the estimated value of each word computed by the computation unit 105, thereby acquiring a histogram as the estimated value distribution of each word. Further, the estimated-word-value assignment unit 107 relates each word, a comment identifier (or comment identifiers) corresponding thereto, and a histogram (or histograms) corresponding thereto. An example of correspondence will be described later with reference to
The scene information extraction unit 108 performs content zone extraction based on the estimated value distribution of each word generated by the estimated-word-value assignment unit 107. Particulars concerning the scene information extraction unit 108 will be described later with reference to
The scene information database 109 stores information concerning scenes corresponding to zones of video content extracted by the scene information extraction unit 108. Specifically, the scene information database 109 stores, for example, scene labels as words symbolizing the respective scenes, content identifiers, and the start and end times of the scenes. The scene information database 109 will be described later in detail with reference to
Referring back to
Firstly, the comment information acquisition unit 102 initializes a table that includes, as data of each row, a word, its article and its comment identifier(s) (step S201).
Subsequently, the comment information acquisition unit 102 acquires comment information items one by one from the comment information database 101. If it is determined at step S202 that all comment information acquired from the comment information database 101 is already subjected to morpheme analysis, the morpheme analysis unit 103 proceeds to step S205. In contrast, if there is comment information not yet subjected to morpheme analysis, the morpheme analysis unit 103 proceeds to step S203. Whenever the morpheme analysis unit 103 acquires comment information from the comment information acquisition unit 102, it performs morpheme analysis on the comment information. If it is determined at step S203 that unanalyzed comment information contains no morphemes, the program returns to step S202, whereas if it contains a morpheme, the program proceeds to step S204. At step S204, the morpheme analysis unit 103 updates the table by adding, to the table, the analysis result concerning the newly analyzed morpheme. The table is stored in, for example, a memory (not shown).
After the comments of all comment information are subjected to morpheme analysis, the computation unit 105 computes or estimates the value of each word, utilizing the table output from the morpheme analysis unit 103. Firstly, the estimated-word-value assignment unit 107, for example, initializes a table that includes, as data of each row, a word, content identifier and estimated value distribution (step S205).
The computation unit 105 acquires a word in units of rows from the table “words, articles, comment identifiers” at step S206. If it is determined at step S206 that the acquired word is not yet estimated, the program proceeds to step S207, whereas if all words in the table “words, articles, comment identifiers” are already estimated, the program proceeds to step S211.
The word-value computation unit 114 incorporated in the computation unit 105 searches the morpheme database 104 for acquiring (computing) the estimated value of the word. After that, the computation unit 105 computes the degree of correction concerning the estimated value of the word in units of comments that contain it, based on the length of the comments, the attribute of the comments, and the estimated value corresponding to a user who has posted the comments (step S207). The estimated values corresponding to users are acquired from the user database 106.
At step S208, the computation unit 105 refers to the comment information database 101 to acquire video content related to the comment identifier(s) corresponding to the word and shown in the table “words, articles, comment identifiers”, and the zone of the content related to the comments indicated by the comment identifier(s) (i.e., the start and end times of the content related to the comments).
Whenever the video content related to comments and the zone of the content related thereto are acquired based on the comments, the estimated-word-value assignment unit 107 assigns, to the zone, the estimated word value acquired by the computation unit 105 (step S209). Namely, the estimated-word-value assignment unit 107 adds the estimated value determined at step S207 to the estimated value distribution defined by the start and end times. At step S210, the estimated-word-value assignment unit 107 updates the table “words, content identifiers, estimated value distributions”, and returns to step S206 to acquire the next word.
If it is determined at step S206 that there is no word which is not yet subjected to estimation, the scene information extraction unit 108 extracts a content zone (or content zones), i.e., scene information, for which the word acquired at step S206 should be labeled, based on the estimated value distribution generated in units of words by the estimated-word-value assignment unit 107 (step S211).
Referring to
In
Further, in
Referring to
If the morpheme analysis unit 103 receives, for example, comment information 1, it divides the comments “This mountain appears also in that movie” into portions, such as “This: adjective”, “mountain: noun”, “appears: verb”, “also: adverb”, “in: preposition”, “that: adjective” and “movie: noun.” These combinations of “words and articles” divided by the morpheme analysis unit 103 are added to the table of
Referring to
After morpheme analysis is performed on the comments of all comment information, the computation unit 105 acquires, by computation, the estimated value of each word using the table generated by the morpheme analysis unit 103. Various word estimation methods are possible. The embodiment employs a method for correcting the estimated values of words, using comment information that contains the words.
Firstly, the computation unit 105 acquires a combination of a word, article and comment identifier(s) from the table generated by the morpheme analysis unit 103 (step S601). Subsequently, the word-value computation unit 114 searches the morpheme database 104 for acquiring (computing) the estimated value of the word (step S602).
It is considered that higher estimated values should be imparted to words, such as nouns and verbs, which are detected at a lower frequency and have a greater information quantity than words, such as prepositions and pronouns. In light of this, different estimated values are preset for different articles. Alternatively, estimated values may be preset in units of words, based on the meaning of each word and character string length of each word. Yet alternatively, instead of directly using an estimated value set for each word, the estimated value of a word may be divided by the detection frequency of the word (for example, if a certain word appears twice in certain comments, the estimated value of the word is set to ½), or the estimated value of each word may be updated based on the total detection frequency (to reduce the estimated values of often used words so as not to bury a not often used word in the first-mentioned ones). Thus, the estimated value of each word may be determined from its detection frequency.
After that, the computation unit 105 computes the degree of correction of the estimated value of each word in units of comments that contain it, based on the length of comments, the attribute of the comments, or the estimated value corresponding to the user who has posted the comments (steps S603, S604 and S605).
The reason why correction is performed based on the length of comments is that the estimated value of “mountain” contained in a long comment full of knowledge, such as “This mountain erupted in 19xx, and . . . in 19xx”, should be discriminated from that of “mountain” contained in a short comment, such as “That's a mountain!” The length of comments is, for example, the length of the character string of the comment, or the number of words included in the comment. The comment-character-string-length computation unit 110 measures the character string length of a comment, and the comment-word-number computation unit 111 counts the number of words included in the comment (step S603). Assuming that the character string length is L and the number of words in a comment is N1, correction utilizing the length of a comment can be performed using, for example, the expression αL+βN1 (α and ≈ being appropriate coefficients). Based on this expression, the computation unit 105 performs correction.
The reason why correction is performed based on the attribute of a comment is that a return comment reflect the content of a parent comment, and that it is considered that a comment with a large number of return comments much influence other comments. Whether or not the comments are return ones, or the number of return comments can be regarded as an attribute. The return-comment determination unit 112 determines whether the comments are return ones and the return-comment-number computation unit 113 computes the number of return comments (step S604). Assume here that R indicates whether the comment is return one (if R is 1, the comment is determined to be return one, whereas if R is 0, the comment is determined not to be return one), and that the number of return comments is N2. In this case, the degree of correction based on the attribute of the comments can be expressed using the expression γR+δN2 (γ and δ being appropriate coefficients). Based on this expression, the computation unit 10S performs correction. Further, correction may be performed by attaching, to comment, comment attribute information that indicates whether the comment relate to “question”, “answer”, “exclamation”, “storage of information” or “spoiler”, when a user posts the comment.
The reason why correction is performed based on estimated values corresponding to users is that the estimated value of a word in comments posted by a junior user of few utterances should be discriminated from that of a word in comments posted by a senior user of many utterances. The user search unit 115 searches the user information database for computing the degree of correction using an estimated value corresponding to a user (step S605). For instance, the computation unit 105 reduces the estimated value of a word in comments posted by a junior user of few utterances, and increases that of a word in comments posted by a senior user of many utterances.
After that, the computation unit 105 performs one of the above-described corrections to thereby acquire a corrected estimated value.
Referring to
The user database 106 may set an estimated value in units of groups to which each user belongs, or may update an estimated value for each user in accordance with the frequency of their utterances. Alternatively, the user database 106 may update an estimated value for a certain user in light of the votes (such as “acceptable”, “unacceptable”, “useful” and “useless”), of other users who have read the utterance of the certain user. In
Referring to
Firstly, the estimated-word-value assignment unit 107 initializes the table “words, content identifiers, estimated value distributions” shown in
Based on the comments acquired at step S901, the estimated-word-value assignment unit 107 acquires video content related to the comments, and the zone(s) of the video content related to the comments (S902). For instance, in the table of
Whenever the estimated-word-value assignment unit 107 acquires video content related to the comments, and the zone(s) of the video content related to the comments, it assigns, to the zone(s), the estimated value of each word acquired by the computation unit 105, and updates the table “words, content identifiers, estimated value distribution” (step S903). Assuming that all words in all comment information have an estimated value of 1 for facilitating the description, the estimated value distributions concerning the word “mountain” in the video content X are set to 1 in the 00:01:30 to 00:02:00 zone and 00:04:30 to 00:05:00 zone, to 2 in the 00:02:00 to 00:03:30 zone and 00:04:00 to 00:04:30 zone, and to 3 in the 00:03:30 to 00:04:00 zone, referring to
Referring then to
Whenever the estimated-word-value assignment unit 107 generates an estimated value distribution for a word, the scene information extraction unit 108 extracts, from video content, a zone (zones) for which the word should be labeled. Namely, the scene information extraction unit 108 generates, for example, the table formed of content identifiers, start times, end times and scene labels, shown in
To extract word zones, a method for extracting a zone in which the estimated value distribution exceeds a preset threshold value (see
As described above, in the embodiment, a zone coherent in meaning can be extracted. Further, scene information of video content can be anticipated and meta-data can be attached thereto, by extracting a zone coherent in meaning. In addition, the embodiment can follow a dynamic change in scene information due to a change in the interest of users. Accordingly, the embodiment can accurately extract scene information and scenes.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2006-086035 | Mar 2006 | JP | national |