Claims
- 1. A method of constructing and/or browsing a hierarchical representation of a content of a video, comprising:
providing a content hierarchy module representing relationships between segments, sub-segments and shots of a video; providing a visual content of a segment of current interest module representing visual information for a selected segment within the hierarchy; providing a visual overview of a sequential content structure module; providing a unified interaction module for coordinating the function of the content hierarchy module, the visual content of a segment module, and the visual overview of a sequential content structure module, and propagating any request of a module to others; and providing a graphical user interface (GUI) screen for simultaneously showing the content hierarchy module, the visual content of a segment module, and the visual overview of a sequential content structure module.
- 2. Method, according to claim 1, wherein the content hierarchy module comprises a tree view of a video, and further comprising:
displaying a root segment and any number of its child and grandchild segments in the tree view of a video; selecting a current segment in the tree view and adding a short textual explanation to each segment; associating a metadata description with each segment, said metadata description comprising at least one of the title, start time and duration of the segment; and associating a key frame image with the segment.
- 3. Method, according to claim 2, wherein a selected one of the key frames for the sub-segments is selected as the key frame for the current segment.
- 4. Method, according to claim 2, further comprising:
providing a symbol on the key frames for the sub-segments indicating whether: the key frame has been selected as the key frame for the current segment; and the sub-segment associated with the key frame has some number of its own sub-segments.
- 5. Method, according to claim 1, wherein the visual content of a segment (sub-hierarchy) of current interest module comprises a list view of a current segment, and further comprising:
displaying key frame images each of which represents a sub-segment of the current segment in the list view; displaying at least two types of key frames in the list view, a first type being a plain key frame indicating to the user that the associated sub-segment has no further sub-segments and a second type being a marked key frame indicating to the user that that the associated sub-segment is further subdivided into sub-sub-segments; in response to the user selecting a marked key frame, the selected marked key frame becomes the current segment key frame, its metadata is displayed, and key frame images for its associated sub-segments are displayed in the list view; and providing a set of buttons for modeling operations, said modeling operations comprising at least one of group, ungroup, merge, split, and change key frame.
- 6. Method, according to claim 1, wherein the visual overview of a sequential content structure module comprises a visual rhythm of the video, and further comprising:
displaying at least a portion of the visual rhythm; providing a shot marker at each shot boundary, adjacent the visual rhythm; navigating through the visual rhythm display by forwarding or reversing to display another portion of the visual rhythm; and controlling the horizontal scale factor of the visual rhythm display by adjusting the time resolution of the portion of the visual rhythm being displayed.
- 7. Method, according to claim 6, wherein the portion of the visual rhythm being displayed in the view of visual rhythm is a virtual representation of the visual rhythm.
- 8. Method, according to claim 6, wherein the visual overview of a sequential content structure module comprises a visual rhythm of the video, and further comprising:
displaying at least a portion of the visual rhythm; and displaying an audio waveform displayed in parallel with, and synchronized to, the visual rhythm display according to time line by adjusting the time scale of the visual rhythm.
- 9. Method, according to claims 6, further comprising:
synchronizing the audio waveform with the visual rhythm by adding extra lines into or dropping selected lines from the visual rhythm
- 10. Method, according to claim 9, further comprising:
providing a simplified representation of the hierarchical tree structure emphasizing the relative durations and temporal positions of the segments that lie in the path from a root segment to a current segment with multiple bar segments; wherein each bar segment has a length corresponding to the relative duration of the corresponding video segment, and each bar segment being visually distinct from adjacent bar segments; displaying information about the temporal and hierarchical locations of a selected video segment; navigating the video hierarchy to locate specific video segments or shots of interest; in response to selecting a position along the hierarchical status bar, highlighting the video segment associated with that position in both the tree view and visual rhythm view; and providing user information on the nested relationships, relative durations, and relative positions of related video segments, graphically.
- 11. A graphical user interface (GUI) for constructing and/or browsing a hierarchical representation of a content of a video, comprising:
means for showing a status of a content hierarchy, by which a user is able to see a current graphical tree structure the hierarchical representation being built, and to visually check the content of a video segment of current interest as well as the contents of the segment's sub-segments; means for showing the status of the video segment of current interest; means for showing the status of a visual overview of a sequential content structure, including a visual pattern of the sequential structure, for providing both shot contents and positional information of shot boundaries, and for providing time scale information implicitly through the widths of the visual pattern, and for quickly verifying the video content, segment-by-segment, without repeatedly playing each video segment, and for finding a specific part of interest or identifying separate semantic units in order to define the video segments and their sub-segments by quickly skimming through the video content without playback; means for displaying a visual representation of a nested relationship of the video segments and their relative temporal positions and durations, and for providing the user with an intuitive representation of a nested structure and related temporal information of the video segments; and means for displaying results of a content-based key frame search.
- 12. A GUI, according to claim 11, wherein the list view of a current segment comprises interfaces for the modeling operations to manipulate the hierarchical structure of the video content, and further comprising:
before performing one of the modeling operations, selecting input segments from the list of key frames representing the sub-segments of the current segment in the list view of a current segment; and invoking the modeling operation by clicking on one of the corresponding control buttons for the modeling operation; wherein the modeling operations involve: in the group operation, taking a set of sibling nodes as an input, creating a new node and inserting it as a child node of the siblings' parent node, and making the new node parent to the sibling segments which are grouped; in the ungroup operation, removing a node and making its child nodes child to its parent node; in the merge operation, given a set of adjacent sibling nodes as an input, creating a new node that a child node of the siblings' parent node, then making the new node parent to all the child nodes under the sibling nodes; in the split operation, taking a node whose children can be divided into two disjoint sets of child nodes and decomposing the node into two new nodes, each of which has a portion of child segments as its child segments; and in the change key frame operation, for a given segment, replacing the key frame of the parent of the given segment with the key frame of the given segment; wherein the parent, child and sibling nodes represent segments or sub-segments in the hierarchical structure of the video content.
- 13. A GUI, according to claim 11, wherein the view of visual rhythm comprises interfaces for the shot verification/validation operations, and further comprising:
designating shot boundaries by locating a shot marker at each boundary, adjacent the visual rhythm; providing a cursor on the visual rhythm that points to a specific frame or point of current interest for applying the Set shot marker operation; and specifying a single shot marker or multiple successive shot markers for applying the delete shot marker or delete multiple shot markers operations; wherein the shot verification/validation operations involve: in the set shot marker operation for manually dividing a shot into two adjacent shots by placing a new shot marker at a corresponding point along the visual rhythm; in the delete shot marker operation for manually combining two adjacent shots into a single shot by deleting a designated shot marker between the two shots at corresponding point along the visual rhythm; and in the delete multiple shot markers operation for manually combining more than three adjacent shots into a single shot by deleting successive designated shot markers between the shots at corresponding points along the visual rhythm.
- 14. A GUI, according to claim 11, wherein the list view of key frame search comprises interfaces for the semantic clustering, and further comprising:
adjusting a similarity threshold value for another content-based key frame search by clicking on a slide bar of the value; triggering the search by clicking on a corresponding control button; performing the re-adjusting and re-triggering the search as many times as a user gets a desired search result; triggering iterative groupings by clicking on a corresponding control button; wherein the semantic clustering involves: in specifying a clustering range by selecting any segment in a current hierarchy being constructed; in selecting a recurring shot that occurs repetitively from a list of shots of a video within the clustering range; in using a key frame of the selected shot as a query frame, performing a content-based image search in the list of shots within the specified clustering range in order to search for all recurring shots whose key frames exhibit visual similarities to the query frame; in listing the retrieved recurring shots in temporal order; and with the temporally ordered list of the retrieved recurring shots, replacing a current sub-hierarchy of the selected segment with a new semantic hierarchy by iteratively grouping intermediate shots between each pair of two adjacent recurring shots into a single sub-segment of the selected segment.
- 15. A method for constructing or editing a hierarchical representation of a content of a video, said video comprising a plurality of shots, comprising:
providing automatic semantic clustering; providing manual modeling operations; providing manual shot verification/validation operations; and interleaving the manual and automatic methods in any order, and applying them as many times as a user wants.
- 16. Method, according to claim 15, said semantic clustering further comprising:
specifying a clustering range by selecting any segment in a current hierarchy being constructed; selecting a recurring shot that occurs repetitively from a list of shots of a video within the clustering range; using a key frame of the selected shot as a query frame, performing a content-based image search in the list of shots within the specified clustering range in order to search for all recurring shots whose key frames exhibit visual similarities to the query frame; listing the retrieved recurring shots in temporal order; and with the temporally ordered list of the retrieved recurring shots, replacing a current sub-hierarchy of the selected segment with a new semantic hierarchy by iteratively grouping intermediate shots between each pair of two adjacent recurring shots into a single sub-segment of the selected segment.
- 17. Method, according to claim 16, further comprising:
adjusting a similarity threshold value for the search.
- 18. Method, according to claim 16, further comprising:
replacing a current sub-hierarchy of the selected segment with a new semantic hierarchy by iteratively grouping intermediate shots between pairs of adjacent detected shots into a single segment.
- 19. Method, according to claim 16, further comprising:
if the semantic clustering does not lead to a well-defined semantic hierarchy, triggering the search operation with different similarity threshold values until most of similar key frames are retrieved.
- 20. Method, according to claim 16 further comprising:
looking deeper into the key frames to see if the current hierarchy made so far reflects well the content of the video and, if not, using modeling operations to transform the hierarchy until the desirable one is constructed.
- 21. Method, according to claim 16, said modeling operations further comprising:
in the group operation, taking a set of sibling nodes as an input, creating a new node and inserting it as a child node of the siblings' parent node, and making the new node parent to the sibling segments which are grouped; in the ungroup operation, removing a node and making its child nodes child to its parent node; in the merge operation, given a set of adjacent sibling nodes as an input, creating a new node that a child node of the siblings' parent node, then making the new node parent to all the child nodes under the sibling nodes; in the split operation, taking a node whose children can be divided into two disjoint sets of child nodes and decomposing the node into two new nodes, each of which has a portion of child segments as its child segments; and in the change key frame operation, for a given segment, replacing the key frame of the parent of the given segment with the key frame of the given segment; wherein the parent, child and sibling nodes represent segments or sub-segments in the hierarchical structure of the video content.
- 22. Method, according to claim 15, further comprising:
selecting a clustering range which is a portion of the entire video, said clustering range comprising one or more segments of the video; repetitively grouping visually similar consecutive shots based on the similarities of their key frames by a request of a user; and if recurring shots are present, repetitively grouping consecutive shots between each pair of two adjacent recurring shots by a request of a user.
- 23. Method, according to claim 15, wherein there already exists a table of contents (TOC) tree for a reference video, comprising:
performing template-based segmentation on a current video using the TOC template from the reference video to construct a TOC tree for the current video; and repeating the process of template-based segmentation at lower levels of the hierarchy.
- 24. Method, according to claim 15, said shot verification/validation operations further comprising:
in the set shot marker operation, taking a shot as an input, dividing the shot into two adjacent shots; the delete shot marker operation, taking a set of two adjacent shots as an inputs, combining the two shots into a single shot; and the delete multiple shot markers operation, taking a set of more than three adjacent shots as an inputs, combining the shots into a single shot.
- 25. Method, according to claim 15, wherein the video comprises a plurality of story units each of which has leading title shots and their own recurring shots, further comprising:
detecting shots and automatically generating an initial two-level hierarchy structure of all the shots grouped as nodes under a root node, each shot having a key frame associated therewith; identifying story units with their leading title shots; performing the group modeling operation for each identified story unit starting with the title shot, to create a new hierarchy structure having a third level of nodes between the nodes and the root node; and executing semantic clustering using one of the recurring shots as a query frame for each grouped story unit.
- 26. Method, according to claim 15, further comprising:
dividing the video stream into shots and selecting key frames of the detected shots; grouping the detected shots into a single root segment, resulting in an initial two-level hierarchy; and repeatedly performing at least one of modeling processes comprising shot verification, defining story unit, clustering, editing hierarchy.
- 27. Method, according to claim 26, wherein the modeling processes involve:
in the shot verification process, performing at least one of the following operations: set shot marker, delete shot marker, delete multiple shot markers, in the defining story unit process, checking to determine if there are the leading title segments and, if so, grouping all shots between two adjacent title segments into a single segment by manually applying the group operation to the shots, in the clustering process, choosing between performing no clustering, performing semantic clustering and performing syntactic clustering; and in the editing hierarchy process, the user manually edits the current hierarchy with one of the following operations: group, ungroup, merge, split, change key frame.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. patent application Ser. No. 09/911,293 filed Jul. 23, 2001, which is a non-provisional of:
[0002] provisional application No. 60/221,394 filed Jul. 24, 2000;
[0003] provisional application No. 60/221,843 filed Jul. 28, 2000;
[0004] provisional application No. 60/222,373 filed Jul. 31, 2000;
[0005] provisional application No. 60/271,908 filed Feb. 27, 2001; and
[0006] provisional application No. 60/291,728 filed May 17, 2001.
[0007] This application is a continuation-in-part of PCT Patent Application No. PCT/US01/23631 filed Jul. 23, 2001 (Published as WO 02/08948, 31 Jan. 2002), which claims priority of the five provisional applications listed above.
[0008] This application is a continuation-in-part of U.S. Provisional Application No. 60/359,567 filed Feb. 25, 2002.
Provisional Applications (6)
|
Number |
Date |
Country |
|
60221394 |
Jul 2000 |
US |
|
60221843 |
Jul 2000 |
US |
|
60222373 |
Jul 2000 |
US |
|
60271908 |
Feb 2001 |
US |
|
60291728 |
May 2001 |
US |
|
60359567 |
Feb 2002 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09911293 |
Jul 2001 |
US |
Child |
10368304 |
Feb 2003 |
US |
Parent |
PCT/US01/23631 |
Jul 2001 |
US |
Child |
10368304 |
Feb 2003 |
US |