The invention relates to the field of automatic video segmenting.
In order for a user to quickly browse a video content, some solutions already exist. They typically consist in segmenting a video in sub-segments. A first known segmenting solution consists in detecting shots in the video content, determining similarities between the detected shots and merging some detected shots so as to get a reasonable number of segments for the video content. The steps of detecting and determining similarities take usually some hours, it is therefore typically done only once and leads to a semantic but fixed segmenting. Therefore, a second known segmenting solution is proposed in US2010/0162313, consisting in segmenting a video content according to time intervals entered by the user. While this is an interesting solution to quickly get a segmented video, the segmenting is somehow arbitrary because it relies only on the time intervals, but does not take account of the semantic of the video, as it is done with the first known segmenting solution. This second known segmenting solution has, however, the advantage of allowing the user to choose the granularity of the segments of the content video to segment without waiting too long, contrary to the first described solution, which leads to a fixed segmenting.
While the two described solutions are both valuable, their respective advantages appear to be contradictory in that they cannot be combined in a single solution. It is therefore an object of the invention to overcome the limits of the present state of the art by providing such solution, allowing the user to quickly segment a video in a semantic manner.
It is proposed a method for segmenting an audiovisual content comprising retrieving (11) pre-segmentation data (21) representative of similarity values between a plurality of shots of the audiovisual content (20), clustering (12) contiguous shots having a similarity value smaller than an initial threshold (TS1) similarity value into clustered segments, selecting one clustered segment, detecting a new plurality of shots in the clustered segment, determining similarity values between the shots of the new plurality of shots, and clustering (12) the contiguous shots belonging to the new plurality of shots and having a similarity value smaller than a second threshold (TS2) similarity value, the second threshold (TS2) similarity value being smaller than the initial threshold (TS1) similarity value.
As the step of generating data representative of similarity values between a plurality of shots of the audiovisual content is time consuming, it is advantageously performed offline. This way, those data are retrieved for use, and the clustering, which necessitates much less time and computing power, is done at the level of the end user. As a result, the user may browse sub-segments of a segment, the sub-segments having a meaningful semantic content.
The data may be retrieved from a server or from a physical medium.
The method may also comprise a step of modifying the initial threshold similarity value before the clustering step.
This way, the user has the possibility to personalize the segmentation of the audiovisual content in that he approaches intuitively the segment granularity he wishes.
Advantageously, the method comprises a step of modifying the second threshold similarity value.
This way, the granularity of sub-segments of a segment may be approached according to the segment granularity chosen by the user.
The invention also relates to an apparatus for segmenting an audiovisual content comprising a module adapted to retrieve data representative of similarity values between a plurality of shots of the audiovisual content, and a processor for clustering contiguous shots having a similarity value smaller than an initial threshold similarity value into clustered segments.
For a better understanding, the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to the described embodiments and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims.
The pre-segmentation data 21 may be stored on a server and retrieved 11 upon request by a receiver 24 via a network. The pre-segmentation data 21 may also be written to a physical medium, such as a Blu-ray disc for example.
Pre-segmentation data 21 representative of similarity values between a plurality of shots of an audiovisual content 20 are first generated 10 as follows: a shot detection is first performed on the audiovisual content 20 by determining the luminance histogram differences between two successive images. When this difference is above a predefined threshold, then a transition between two shots has been detected. Once the shots have been all detected, a module extracts determined features as well as a key frame for each detected shot. This key frame possesses features which are characteristic of the detected shot. There are many ways to extract a key frame: one way is to extract the middle frame of the shot. A matrix similarity MS(i,j) related to the detected shots is then determined. The elements of this similarity matrix are coefficients representing a distance between shots. The distance between two shots is a value representative of the distance between features relative to each of the shot. These features are for example color features and edge features associated to the extracted key frame. Let us suppose fifteen shots have been detected: then MS(3,13) represents the distance between the detected shot 3 and the detected shot 13. An example of distance is:
M(3,13)=Distance(3,13)=(1/(wc+we))*(wc*color_dist+we*edge_dist)
Where we and we are weighting coefficients respectively related to colour and edge features, and color dist and edge dist respectively represent color distance and edge distance between shot 3 and shot 13.
Similarity between two shots is defined as the inverse of the distance between two shots.
Optionally, a chaptering is also generated outside of the receiver 24 and is comprised in the pre-segmentation data 21. A chaptering designates a set of temporal indexes associated to frames of the audiovisual content 20. A classic way to generate a chaptering is to first cluster 12 the contiguous detected shots which have a similarity value smaller than pre-defined similarity, and then to generate an identifier for each cluster. Each cluster is characterized by at least two temporal indexes, one representing the beginning of the cluster and the other representing the end of the cluster.
At the level of the receiver 24, an audiovisual content 20 is retrieved 11, as well as the data representative of similarity values between a plurality of shots of the audiovisual content 20. A step of detection of contiguous shots is performed at the level of the receiver 24. The detected contiguous shots with a similarity above an initial threshold similarity value TS1 are clustered 12. This allows to generate a chaptering of level 1 at the level of the receiver 24. Identifiers, key frames for example, are then generated to identify the generated clusters, and are displayed on a display to the user. Optionally, this chaptering of level 1 may be also generated outside the receiver 24 and retrieved 11 by the receiver 24.
As the step of generating the semantic pre-segmentation is time consuming and takes typically some hours, but needs to be done only once, it is advantageous to perform it offline before that the receiver 24, for example a set top box with limited computing power, retrieves 11 the pre-segmentation data 21.
As illustrated in
Optionally, an obtained chapter may be itself chaptered. This is illustrated by
This process can of course be generalized: thus, a chaptering of level (i+1) may be obtained from a chaptering of level i, by performing a clustering 12 of contiguous shots presenting a threshold similarity TS (i+1) smaller than a threshold similarity TS(i). The level of chaptering may then be easily ascended or descended, and allows an intuitive and quick browsing.
Number | Date | Country | Kind |
---|---|---|---|
12305337.3 | Mar 2012 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/054190 | 3/1/2013 | WO | 00 |