This application is based on and claims priority under 35 USC 119 from Japanese Patent Application No. 2016-043051 filed Mar. 7, 2016.
The present invention relates to a video editing apparatus, a video editing method, and a non-transitory computer readable medium.
According to an aspect of the invention, there is provided a video editing apparatus including a storing unit, an input unit, a segment selection unit, and a generation unit. The storing unit stores video data along with video attribute information indicating, for each concept, a confidence score that the concept is included in each of segments into which the video data has been divided. The input unit inputs, as preference information, a coefficient of each concept which is desired to be included in summary information and a coefficient of a superordinate concept of the concept which is desired to be included in the summary information. The segment selection unit selects, based on the preference information input by the input unit, at least one segment that matches the preference information, from among plural segments of the video data stored in the storing unit. The generation unit generates, based on video of the at least one segment selected by the segment selection unit, summary information representing contents of the video.
Exemplary embodiments of the present invention will be described in detail based on the following figures, wherein:
Exemplary embodiments of the present invention will be described in detail with reference to drawings.
A travel information providing system according to an exemplary embodiment of the present invention includes, as illustrated in
The terminal apparatuses 22 and 23 are personal computers of general users A and B, respectively, and are configured to allow the users to access the server apparatus 10 via the network 30 and browse travel video.
Furthermore, the terminal apparatus 21 is installed at a travel information providing site operated by, for example, a travel information provider or the like. The terminal apparatus 21 is a video editing apparatus which selects video matching preference information of the users A and B from among travel information video provided by the server apparatus 10, edits the selected video into summary information such as digest video, digest images, or the like in accordance with preference of the users A and B, and provides the summary information to the users A and B.
In
A hardware configuration of the terminal apparatus 21 which functions as a video editing apparatus in a travel information providing system according to an exemplary embodiment is illustrated in
The terminal apparatus 21 includes, as illustrated in
The CPU 11 performs a predetermined process based on a control program stored in the memory 12 or the storing device 13, and controls an operation of the terminal apparatus 21. In this exemplary embodiment, an explanation is provided in which the CPU 11 reads a control program stored in the memory 12 or the storing device 13 and executes the read control program. However, the program may be stored in a storing medium such as a compact disc-read only memory (CD-ROM) and provided to the CPU 11.
The terminal apparatus 21 according to this exemplary embodiment includes, as illustrated in
The video data acquisition unit 31 acquires, via the network 30, for example, video data such as travel information video provided by the server apparatus 10.
The preference information input unit 32 inputs, as preference information, a coefficient of each concept which is desired to be included in summary information and a coefficient of a superordinate concept of each concept which is desired to be included in summary information.
In this exemplary embodiment, video to be edited from which summary information is generated is travel information video. Therefore, for example, various items including golf, tennis, horse riding, strawberry picking, ramen, soba, sushi, castles, shrines, temples, and world heritage sites are set as concepts.
Furthermore, as a superordinate concept of each concept, for example, an item “activities” is set as a superordinate concept of concepts such as golf, tennis, horse riding, and strawberry piking, an item “dining” is set as a superordinate concept of concepts such as ramen, soba, and sushi, and an item “sightseeing spots” is set as a superordinate concept of concepts such as castles, shrines, temples, and world heritage sites.
Details of preference information will be described later.
The video vector information calculation unit 33 calculates, based on video data acquired by the video data acquisition unit 31, video vector information (video attribute information) indicating, for each concept representing the contents of video data, the confidence score that (the degree to which) the concept is included in the video data.
Specifically, the video vector information calculation unit 33 calculates video vector information by dividing video data into plural segments according to the contents of the video data, performing image processing for each of the divided segments, and calculating the confidence score that each concept is included in each of the divided segments.
The video vector information storing unit 34 stores video data acquired by the video data acquisition unit 31 as well as video vector information calculated by the video vector information calculation unit 33.
The number-of-selected-segments designation unit 37 designates the number of segments to be selected by the segment selection unit 35 for generating summary information from video data. For example, in the case where video data is divided into eight segments, the number-of-selected-segments designation unit 37 inputs a value “3” as the number of segments to be selected from the eight segments.
The segment selection unit 35 selects, based on preference information input by the preference information input unit 32, a number of segments matching the preference information, the number being designated by the number-of-selected-segments designation unit 37, from among plural segments of video data stored in the video vector information storing unit 34.
The summary information generation unit 36 generates, based on the video of the segments selected by the segment selection unit 35, summary information representing the contents of the video.
For example, the summary information generation unit 36 may generate, as summary information, digest video (summary video) which includes a series of connected segments selected by the segment selection unit 35. Furthermore, the summary information generation unit 36 may generate, as summary information, plural digest images (summary images) which include frame images extracted from the segments selected by the segment selection unit 35.
Various methods are available as a method for selecting a segment that matches preference information of a user from among plural segments into which video data has been divided.
For example, the segment selection unit 35 calculates, for each of the divided segments, a score for a subordinate concept (matching degree of a subordinate concept), based on a coefficient of each concept of preference information input by the preference information input unit 32 and a confidence score of each concept in video vector information. The segment selection unit 35 also calculates, for each segment, a score for a superordinate concept (matching degree of a superordinate concept), based on a coefficient of the superordinate concept of the preference information and a maximum value of values each obtained by multiplying a coefficient of each concept included in the superordinate concept by a confidence score of the concept in the video vector information. Then, the segment selection unit 35 selects a segment that matches the input preference information by calculating a video score (matching degree) with respect to the preference information for each segment, based on the score for the subordinate concept and the score for the superordinate concept, selecting a segment with the maximum video score, reducing a coefficient corresponding to a concept having a large confidence score value for the selected segment and a coefficient of a superordinate concept including the concept, and then sequentially calculating the matching degree with respect to the preference information for individual remaining segments (first segment selection method).
Furthermore, in the case where another segment selection method is used, the segment selection unit 35 may randomly select a predetermined number of segments from among plural segments into which video data has been divided, generate integrated video vector information by selecting a maximum value of confidence scores of a concept in the individual segments selected in a random manner and defining the selected maximum value as a confidence score of the concept, calculate a score for a superordinate concept (matching degree of a superordinate concept), based on a coefficient of the superordinate concept of preference information and a maximum value of values each obtained by multiplying a coefficient of each concept included in the superordinate concept by a confidence score of the concept in video vector information, repeat processing for randomly selecting a predetermined number of segments, and select a combination of segments which exhibits a maximum score for a superordinate concept as a segment that matches the preference information (second segment selection method).
Next, an operation of the terminal apparatus 21 in the travel information providing system according to this exemplary embodiment will be described in detail with reference to drawings.
First, a process for calculating video vector information by the video vector information calculation unit 33 will be described with reference to a flowchart of
The video vector information calculation unit 33 analyzes the contents of video acquired by the video data acquisition unit 31 to divide the video into plural segments according to the set of contents (step S101).
Next, the video vector information calculation unit 33 detects each concept included in each of the divided segments of the video, using a method such as object detection, image recognition, scene recognition, and motion analysis, and calculates video vector information for each segment (step S102).
For concept detection, each segment is further divided into sub-segments, and concept detection processing is performed for each of the sub-segments. Then, the maximum value of detection values of all the sub-segments is defined as the final detection value of the segment. In this case, sub-segments may overlap.
Furthermore, in such concept detection, structure analysis is performed for each frame in a segment, and a detection result obtained at the moment at which the best composition is obtained is defined as the final detection value of the segment.
Such concept detection may be performed by analyzing a foreground and a background, performing object detection for the foreground, and performing scene recognition for the background.
In the case where there are N concepts for which confidence score is to be detected, N-dimensional video vector information is calculated.
A specific example of N-dimensional video vector information calculated as described above is illustrated in
In
N concepts: concept 1 (sushi), concept 2 (soba), concept 3 (scuba diving), concept 4 (golf), concept 5 (horse riding), . . . , and concept N (castles), are set as concepts whose confidence score is to be detected.
Confidence scores of N concepts are values each representing the degree of likelihood that the concept is included in video. The confidence score that the concept is included in the video increases as the value increases.
N-dimensional video vector information is calculated for each of the eight segments. For example, for segment 1, N-dimensional video vector information (0.196, 0.179, 0.195, 0.412, 0.134, . . . , and 0.312) is generated.
That is, the video vector information indicates a confidence score that each concept is included in each of the divided segments.
Referring to
Next, an example of the relationship between superordinate concepts and subordinate concepts (concepts) in the travel information providing system according to this exemplary embodiment will be described with reference to
In the example illustrated in
Plural concepts may not be set as subordinate concepts for a single superordinate concept. As with the case of superordinate concept 3 (shopping) and the concept 6 (shopping), only one concept may be set for a single superordinate concept. Furthermore, a concept may be included in each of plural superordinate concepts. Setting may be performed such that, for example, a concept “castles” is included in a superordinate concept “sightseeing spots” and a superordinate concept “history”.
In this example, w11, w12, w23, w24, w25, w36, . . . , and wMN are coefficients representing the degree of preference of a user for the concepts 1 to N. Furthermore, W1, W2, W3, . . . , and WM represent coefficients representing the degree of preference of a user for the superordinate concepts 1 to M.
That is, by setting a large value for a coefficient corresponding to a concept which is desired to be included in summary information among the coefficients w11, w12, w23, w24, w25, w36, . . . , an wMN of the concepts, summary information of digest video or the like including the concept is generated. Furthermore, by setting a large value for a coefficient corresponding to a superordinate concept which is desired to be included in summary information among the coefficients W1, W2, W3, . . . , and WM of the superordinate concepts, a confidence score that summary information including a concept belonging to the superordinate concept is generated increases.
Next, examples of an input screen displayed when the preference information input unit 32 inputs preference information of the users A and B through the terminal apparatuses 22 and 23 or the like will be described with reference to
For example, a case where in a questionnaire for user registration of the users A and B with a travel information providing site, preference of the users A and B for traveling is investigated and preference information is generated, will be described.
First, the preference information input unit 32 displays a screen illustrated in
Next, the preference information input unit 32 displays the screen illustrated in
Furthermore, in a similar manner, the preference information input unit 32 displays the screen illustrated in
Then, questionnaires for the other items of superordinate concepts are sequentially presented to the user, and preference information of the user is obtained.
The preference information input unit 32 displays the screens illustrated in
An example of preference information obtained as described above through the screen examples of
In
Furthermore, in
The preference information input unit 32 may automatically obtain preference information based on the contents written to a social networking service (SNS) of the user, instead of inputting preference information based on the contents input by the user as described above, and input a coefficient of a superordinate concept and a coefficient of each concept.
Next, processing for calculating a score for a subordinate concept, a score for a superordinate concept, and a video score by the segment selection unit 35 in the case where the preference information illustrated in
First, a method for calculating a score for a subordinate concept and a score for a superordinate concept will be described with reference to
As illustrated in
Furthermore, the segment selection unit 35 calculates a score for a superordinate concept, based on the N-dimensional video vector information (S1, S2, S3, . . . , and SN), the coefficients w11, w12, w23, . . . , and wMN of the concepts illustrated in
Then, the segment selection unit 35 calculates, based on the score for the subordinate concept and the score for the superordinate concept, a video score representing the matching degree of video data and preference information.
Specific calculation expressions for calculating a score for a subordinate concept, a score for a superordinate concept, and a video score is illustrated in
First, the score for the subordinate concept is obtained by multiplying the video vector information (S1, S2, S3, . . . , and SN) by the coefficients w11, w12, w23, . . . , and wMN of individual concepts and obtaining an accumulated value of the results, as represented by expression (1) of
Specifically, the score for the subordinate concept is obtained by calculating S1·w11+S2·w12+S3·w23+ . . . +SN·wMN.
Then, the score for the superordinate concept is obtained by multiplying, for each category of a superordinate concept, the maximum value of values each obtained by multiplying the value of a confidence score of each concept of video vector information by a coefficient of the concept by a coefficient of the category of the superordinate concept and then accumulating the values obtained for individual superordinate concepts, as represented by expression (2) of
For example, for the superordinate concept 1 (dining), the maximum value of S1·w11 and S2·w12 is obtained based on max(S1·w11, S2·w12). For example, in the case where S1·w11 is maximum, W1·S1·w11, which is obtained by multiplying the value by the coefficient W1 of the superordinate concept 1, is defined as a value for the superordinate concept 1. Then, such a value is obtained for each superordinate concept, and a value obtained by accumulating the values is defined as a score for a superordinate concept.
Furthermore, the video score is calculated by multiplying the score for the subordinate concept and the score for the superordinate concept by p and (1−p), respectively, and adding the obtained results, as represented by expression (3) of
A calculation example in which a score for a subordinate concept, a score for a superordinate concept, and a video score are specifically calculated by substituting the value for the segment 1 of an example of video vector information illustrated in
As represented by expression (1) of
Furthermore, as represented by expression (2) of
Then, as represented by expression (3) of
Then, the segment selection unit 35 calculates the above video score for each segment of travel information video obtained by the video data acquisition unit 31, and selects a segment to be included in summary information, based on the calculated value.
Next, a segment selection method performed by the segment selection unit 35 as described above will be described with reference to a flowchart.
First, a first segment selection method for calculating a video score for each segment, selecting a segment with a maximum video score, changing a coefficient of a concept and a coefficient of a superordinate concept, and sequentially selecting the next segment, will be described with reference to a flowchart of
First, the segment selection unit 35 calculates, for each segment to be selected, a score for a superordinate concept and a score for a subordinate concept (step S201), and calculates a video score of each segment, based on the calculated score of the superordinate concept and the calculated score for the subordinate concept (step S202).
Then, the segment selection unit 35 selects a segment with a maximum video score (step S203).
Next, the segment selection unit 35 selects a coefficient to be changed, from among coefficients of individual concepts and coefficients of individual superordinate concepts which are used for calculation of video vector information, and reduces the selected coefficient by, for example, multiplying the value of the coefficient by 0.5 (step S204).
Specifically, for example, the segment selection unit 35 changes a coefficient corresponding to a concept with a high confidence score in the selected segment and a coefficient of a superordinate concept of the concept, among coefficients of individual concepts and coefficients of individual superordinate concepts.
A specific example for changing a coefficient as described above is illustrated in
In the case where there is an unselected segment (Yes in step S205), the segment selection unit 35 repeats the processing of steps S201 to S204 for the unselected segment. For example, in the case where the segment 1 is selected, the segment selection unit 35 performs similar processing for the remaining segments 2 to 8.
Then, when processing for all the segments is completed (No in step S205), the segment selection unit 35 outputs the selection order of the segments (step S206), and selects segments in predetermined ranks in the selection order as segments that match preference information (step S207).
A specific example of segments rearranged in the selection order as described above is illustrated in
For example, in the case where top three segments are selected as segments to be included in summary information, the segments 1, 6, and 7 are selected by the segment selection unit 35.
A generation example of summary information generated by the summary information generation unit 36 in the case where such segments are selected is illustrated in
In
Furthermore, in
Next, a second segment selection method for selecting a segment to be included in summary information by randomly selecting plural segments from among plural segments into which video data has been divided, generating integrated video vector information from video vector information of a combination of plural segments, and finding a combination of plural segments which exhibits a large score for a superordinate concept of the integrated video vector information, will be described with reference to a flowchart of
First, the segment selection unit 35 randomly selects a predetermined number of segments from among previous segments (step S301). For example, in the case where three is designated by the number-of-selected-segments designation unit 37 as the number of segments to be included in summary information, three segments are selected from eight segments.
Next, the segment selection unit 35 generates integrated video vector information by selecting a maximum value of confidence scores of concepts for the selected three segments (step S302).
For example, as illustrated in
Then, the segment selection unit 35 calculates a score for a superordinate concept, based on the generated integrated video vector information (step S303). The method for calculating a superordinate concept score is the same as the calculation method represented by expression (2) of
Then, in the case where the calculated value of the superordinate concept score is larger than a previously calculated value, the segment selection unit 35 stores the value, and stores information of the combination of segments (step S304). In contrast, in the case where the calculated value of the superordinate concept score is smaller than the past maximum value, the segment selection unit 35 disposes of the value of the superordinate concept score and information of the combination of segments.
Then, the processing of steps S301 to S304 is repeated a predetermined number of times, for example, 100 times (step S305). After that, the stored information of the combination of segments is output (step S306).
In the second segment selection method illustrated in the flowchart of
In particular, for calculation of a score for a superordinate concept, only a maximum value of values each obtained by multiplying a confidence score of a concept by a coefficient of the concept is selected for each superordinate concept. Therefore, a combination of segments having high confidence scores of concepts belonging to different superordinate concepts has a larger score for a superordinate concept.
For example, as illustrated in
Therefore, in the case where the combination of segments illustrated in
As a result, by randomly selecting plural segments, selecting only a maximum value from video vector information of the selected segments to generate integrated video vector information, calculating a score for a superordinate concept, and finding a combination of segments which exhibits a large superordinate concept score, a combination of segments in which confidence scores of various concepts with a large coefficient in preference information of a user are large as well as segments in which only confidence scores of the same concept are large, may be selected.
In the foregoing exemplary embodiment, a case where summary information is generated by dividing video data of travel information into plural segments and selecting a segment that matches preference information from among the divided plural segments has been described. However, the present invention is not limited to this. The present invention may also be applied to a case where summary information is generated from video data different from travel information video.
The foregoing description of the exemplary embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations will be apparent to practitioners skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, thereby enabling others skilled in the art to understand the invention for various embodiments and with the various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2016-043051 | Mar 2016 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
20010056427 | Yoon | Dec 2001 | A1 |
20020157095 | Masumitsu et al. | Oct 2002 | A1 |
20110243529 | Oryoji et al. | Oct 2011 | A1 |
20170255699 | Uchihashi | Sep 2017 | A1 |
Number | Date | Country |
---|---|---|
2002-259720 | Sep 2002 | JP |
2004-126811 | Apr 2004 | JP |
2011-217209 | Oct 2011 | JP |
Entry |
---|
Boreczky et al., “Comparison of video shot boundary detection techniques,” Journal of Electronic Imaging, Apr. 1996, vol. 5, No. 2, pp. 122-128. |
Number | Date | Country | |
---|---|---|---|
20170256284 A1 | Sep 2017 | US |