The invention relates generally to the field of digital image processing, and in particular to a method for associating and viewing related video and still images.
The proliferation of digital image and video capture devices has led to multiple modalities of capture being present at any picture-taking occasion. For example, it is possible to have both videos and still images since most digital cameras now support capture of video clips; and digital camcorders can capture still images. In an important family event or a public event, such as weddings and sports matches, there are usually multiple still and video capture devices capturing the scene simultaneously. This scenario results in videos and stills that overlap in time. For instance, multiple stills may be captured during the duration of a video clip and multiple video sequences may overlap to various degrees. The current state of the art in consumer image management software, such as Google Picasa, Adobe Photo Album and Kodak EasyShare, display still images and videos in chronological order with no ability to indicate overlapping captures. In some cases, the date/time of file creation (not capture date/time) is used for video, which effectively removes video clips from the natural timeline and places them at one end of a batch of media transferred from capture device to storage device. In the best cases, videos are inserted at the point in the timeline indicated by the start of capture. Still images or video captured during the duration of a longer video clip appear after its thumbnail representation, with no indication of possible overlap; where overlaps could be in time or another relevant concept such as location or event.
This mode of display makes it difficult to pick the best representation of a given moment; choose between different modalities or create composites of different modalities. An alternative is to provide browsing mechanisms that explicitly show overlaps between captures of one or more modalities and also allows the user to switch between them on a UI display.
In U.S. Pat. No. 6,950,989, Rosenzweig et al describe a timeline-based browsing view for image collections. The images in the collection can be viewed at different time granularity (year-by-year, month-by-month etc), and also along other metadata such as location taken and people in picture. However, it is assumed that all media in the collection can be placed in order on the timeline, and overlaps in time between media are not handled.
A few patents discuss some aspects of media overlaps in time or media captured at the same event, but in very limited circumstances, and in contexts other than browsing a consumer image collection. In U.S. Pat. No. 6,701,014, Syeda-Mahmood describes a way to associate slides (say in Microsoft PowerPoint) to slides that are being shown on a screen in a video of the presentation. In U.S. Pat. No. 7,102,644, Hoddie et al describe a way to embed movies within a movie, in cases where there is overlap in content between them. The intention is to allow video editors to edit all the related clips at the same time, so that any changes made in one stream can be reflected in the other related ones. In U.S. Pat. No. 7,028,264, Santoro et al describe an interface that shows multiple sources on the same screen, but these sources are not related to each other and are not linked in any way. For example, the sources could be different television channels covering the news, sports, weather and stocks. In U.S. Pat. No. 6,978,047, Montgomery describes storing multiple views of the same event for surveillance applications, but in this case, the video cameras are synchronized. This system does not provide means for relating asynchronous captures that occur in the consumer event captures, and there is no browsing interface provided. In U.S. Pat. No. 7,158,689, Valleriano et al handle asynchronously captured images of an event, but the event type is a special case of a timed event such as a race, and contestants are tracked at various fixed stations. These methods are specific to the applications being described, and provides no framework for handling the generalized problem of browsing multiple sources of media captured asynchronously at the same event.
In accordance with one aspect of the present invention there is provided a method for organizing digital content records including: receiving a first set of digital content records captured from a first digital-content capture device; receiving a second set of digital content records captured from a second digital-content capture device; ordering the first set of digital content records and the second set of digital content records along a common capture timeline; and storing results of the ordering step in a processor-accessible memory system.
In accordance with another aspect of the present invention there is provided a method for organizing digital content records including:
receiving a first set of digital content records captured from a first digital-content capture device, each digital content record in the first set having associated therewith time/date of capture information defining when the associated digital content record was captured, wherein the capture information associated with a particular digital content record from the first set defines that its associated digital content record was captured over a contiguous span of time; receiving a second set of digital content records captured from a second digital-content capture device, each digital content record in the second set having associated therewith time/date of capture information defining when the associated digital content record was captured; ordering the first set of digital content records and the second set of digital content records along a common capture timeline based at least upon the time/date of capture information, or a derivative thereof associated with each of the digital content records in the first and second sets, wherein the ordering step causes the particular digital content record and at least one other digital content record to be associated with a same time/date within the span of time in the capture timeline; and storing results of the ordering step in a processor-accessible memory system.
In accordance with another aspect of the present invention there is provided a method for displaying digital content records including: receiving a set of digital content records organized along a timeline, each digital content record being associated with a point on or segment of the timeline based at least upon its time/date of capture and, optionally, span of capture, at least two digital content records being associated with at least a same point on the timeline;
identifying a current point on the timeline; displaying a digital content record of the set of digital content records as a focus record, the focus record associated with the current point on the timeline and being displayed prominently on a display; displaying first other digital content records of the set of digital content records on the display, the first other digital content records having time/dates of capture or spans of capture temporally adjacent to the current point on the timeline and being displayed less prominently than the focus record on the display; and
displaying second other digital content records of the set of digital content records on the display, the second other digital content records having a time/date of capture or a span of capture equal to or including the current point on the timeline.
In accordance with yet another aspect of the present invention there is provided a method for presenting digital content records including: instructing presentation of a first digital content record on an output device, wherein the first digital content record is a video or audio digital content record; identifying a second digital content record having an association with the first digital content record, wherein the association is based at least upon adjacency in time, a common object represented therein, a common event during which the first and second digital content records were captured, or a common location at which the digital content records were captured; and instructing presentation of the second digital content record on the output device while the first digital content record is being presented.
In accordance with a further aspect of the present invention there is provided a system for indexing media from different sources including:
a means for receiving a first set of digital content records captured from a first digital-content capture device; a means for receiving a second set of digital content records captured from a second digital-content capture device; a means for ordering the first set of digital content records and the second set of digital content records along a common capture timeline; and a means for storing results of the ordering step in a processor-accessible memory system.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims and by reference to the accompanying drawings.
The present invention will be more readily understood from the detailed description of exemplary embodiments presented below considered in conjunction with the attached drawings, of which:
It is to be understood that the attached drawings are for purposes of illustrating the concepts of the invention and may not be to scale.
The data processing system 110 includes one or more data processing devices that implement the processes of the various embodiments of the present invention, including the example processes of
The processor-accessible memory system 140 includes one or more processor-accessible memories configured to store information, including the information needed to execute the processes of the various embodiments of the present invention, including the example processes of
The phrase “processor-accessible memory” is intended to include any processor-accessible data storage device, whether volatile or nonvolatile, electronic, magnetic, optical, or otherwise, including but not limited to, floppy disks, hard disks, Compact Discs, DVDs, flash memories, ROMs, and RAMs.
The phrase “communicatively connected” is intended to include any type of connection, whether wired or wireless, between devices, data processors, or programs in which data may be communicated. Further, the phrase “communicatively connected” is intended to include a connection between devices or programs within a single data processor, a connection between devices or programs located in different data processors, and a connection between devices not located in data processors at all. In this regard, although the processor-accessible memory system 140 is shown separately from the data processing system 110, one skilled in the art will appreciate that the processor-accessible memory system 140 may be stored completely or partially within the data processing system 110. Further in this regard, although the peripheral system 120 and the user interface system 130 are shown separately from the data processing system 110, one skilled in the art will appreciate that one or both of such systems may be stored completely or partially within the data processing system 110.
The peripheral system 120 may include one or more devices configured to provide digital content records to the data processing system 110. For example, the peripheral system 120 may include digital video cameras, cellular phones, regular digital cameras, or other data processors. The data processing system 110, upon receipt of digital content records from a device in the peripheral system 120, may store such digital content records in the processor-accessible memory system 140.
The user interface system 130 may include a mouse, a keyboard, another computer, or any device or combination of devices from which data is input to the data processing system 110. In this regard, although the peripheral system 120 is shown separately from the user interface system 130, the peripheral system 120 may be included as part of the user interface system 130.
The user interface system 130 also may include a display device, a processor-accessible memory, or any device or combination of devices to which data is output by the data processing system 110. In this regard, if the user interface system 130 includes a processor-accessible memory, such memory may be part of the processor-accessible memory system 140 even though the user interface system 130 and the processor-accessible memory system 140 are shown separately in
The main steps in automatically indexing media from different sources are shown in
Referring to
Next, step 230 is to place the digital content record on a common capture timeline. Media from digital sources contain the time of capture as part of the metadata associated with the digital content record. The digital content records from different sources are aligned according to the time of capture as shown in
Alignment of the digital content may also be based on user-provided annotations at user-defined points along the common timeline. the user-provided annotations include text data, audio data, video data, or graphical data such as text data includes text messages, web links, or web logs.
Automated time alignment of the capture devices based on image similarity is another alternative. A method for aligning media streams when the capture date-time information is unavailable is described in commonly assigned US Patent Application 20060200475 entitled “Additive clustering of images lacking individual date/time information.”
The digital content record is then ordered chronologically based on their relative position on the common time-line, wherein the ordering step causes the particular digital content record and at least one other digital content record to be associated with a same time/date within the span of time in the capture timeline. For video clips, the start time of the video is used for the ordering step. Note that the end time of a video clip can also be computed, if not available in the metadata inserted by the capturing device, by computing the total number of frames divided by the frame-rate of the capture device, and adding this to the known start time. The end time is needed to determine the time difference from the next digital content record, as described later.
Referring to
Referring to
In the scenario of this invention, the “image” set that is provided to the event clustering algorithm (described in U.S. Pat. No. 6,606,411) includes still images as well as key-frames from video clips (along with their computed time of capture) from all sources combined. For example, referring to
Referring to
Referring to
In another embodiment, the links between digital content record are created based on semantic object matches. For example, links are generated between images containing a particular person and video segments that contain the same person. This allows a user to view still images taken of people that appear in videos, or view a video clip of what a person is doing or saying at the instant a particular still image was taken. In commonly assigned patent application Ser. No. 11/559,544, filed Nov. 14, 2006, entitled “User Interface for Face Recognition”, Gallagher et al describe a method for clustering faces into groups of similar faces that are likely to represent distinct individuals using available face recognition technology. Since all the digital content record in our application is from the same event, further refinement of people recognition is possible as described in commonly assigned patent application Ser. No. 11/755,343, filed May 30, 2007 by Lawther et al, entitled “Composite person model from image collections”. In this application, clothing and other contextual information that are likely to remain the same during the event are used to improve recognition of individuals.
Another example of links based on semantic objects is to link images and video frames where similar objects are present in the background that indicates that the two captures were taken against the same backdrop. This allows the user to view still images captured of the same scene that is seen in a video clip, or view the same scene captured from different viewpoints. In commonly assigned application Ser. No. 11/960,800, filed Dec. 20, 2007, entitled “Grouping images by location”, a method for determining groups of images captured at the same location is described. This method uses SIFT features, described by Lowe in International Journal of Computer Vision, Vol 60, No 2., 2004 to match image backgrounds after filtering the features to retain only the features that correspond to potentially unique objects in the image.
The present invention also embodies a method for displaying digital content records. Related media may be displayed based on its current location along a timeline. Referring now to
Referring now to
It is to be understood that the exemplary embodiment(s) is/are merely illustrative of the present invention and that many variations of the above-described embodiment(s) can be devised by one skilled in the art without departing from the scope of the invention. It is therefore intended that all such variations be included within the scope of the following claims and their equivalents.
This application is a divisional of commonly-assigned U.S. Ser. No. 12/206,319 filed Sep. 8, 2008 now abandoned, entitled “Method and Interface for Indexing Related Media From Multiple Sources” by Madirakshi Das et al, the disclosure of which is incorporated herein.
Number | Name | Date | Kind |
---|---|---|---|
6204886 | Yoshimura | Mar 2001 | B1 |
6307550 | Chen | Oct 2001 | B1 |
6606411 | Loui et al. | Aug 2003 | B1 |
6701014 | Syeda-Mahmood | Mar 2004 | B1 |
6901207 | Watkins | May 2005 | B1 |
6910191 | Segerberg | Jun 2005 | B2 |
6950989 | Rosenzweig | Sep 2005 | B2 |
6978047 | Montgomery | Dec 2005 | B2 |
7028264 | Santoro et al. | Apr 2006 | B2 |
7102644 | Hoddie et al. | Sep 2006 | B2 |
7158689 | Valleriano et al. | Jan 2007 | B2 |
7423660 | Ouchi | Sep 2008 | B2 |
7650626 | Suh | Jan 2010 | B2 |
8276098 | Fagans | Sep 2012 | B2 |
20010005442 | Ueda | Jun 2001 | A1 |
20020033899 | Oguma | Mar 2002 | A1 |
20020118952 | Nakajima | Aug 2002 | A1 |
20030231198 | Janevski | Dec 2003 | A1 |
20040028382 | Choi | Feb 2004 | A1 |
20040064207 | Zacks | Apr 2004 | A1 |
20050091597 | Ackley | Apr 2005 | A1 |
20050108643 | Schybergson | May 2005 | A1 |
20050257166 | Tu | Nov 2005 | A1 |
20060090141 | Loui | Apr 2006 | A1 |
20060123357 | Okamura | Jun 2006 | A1 |
20060200475 | Das | Sep 2006 | A1 |
20070071323 | Kontsevich | Mar 2007 | A1 |
20070192729 | Downs | Aug 2007 | A1 |
20070200925 | Kim | Aug 2007 | A1 |
20080044155 | Kuspa | Feb 2008 | A1 |
20080112621 | Gallagher et al. | May 2008 | A1 |
20080298643 | Lawther et al. | Dec 2008 | A1 |
20080307307 | Ciudad | Dec 2008 | A1 |
20080309647 | Blose | Dec 2008 | A1 |
20090161962 | Gallagher et al. | Jun 2009 | A1 |
20090172543 | Cronin | Jul 2009 | A1 |
20090265647 | Martin et al. | Oct 2009 | A1 |
20100281370 | Rohaly | Nov 2010 | A1 |
Number | Date | Country |
---|---|---|
1 927 928 | Jun 2008 | EP |
Entry |
---|
Final Rejection on U.S. Appl. No. 12/206,319, mailed Mar. 29, 2012. |
International Preliminary Report on Patentability for PCT/US2009/004990, issued Mar. 8, 2011. |
International Search Report and Written Opinion for PCT?US2009/004990, mailed Feb. 5, 2010. |
Non-Final Office Action on U.S. Appl. No. 12/206,319, mailed Aug. 2, 2011. |
Non-Final Office Action on U.S. Appl. No. 12/206,319, mailed Dec. 12, 2011. |
Graham Adrian et al.: “Time as Essence for Photo Browsing Through Personal Digital Libraries”, JCDL 2002. Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, Portland, OR, Jul. 14-18, 2002; [Proceedings ACM/IEEE-CS Joint Conference on Digital Libraries], New York, NY, ACM, US, vol. Conf. 2, Jul. 14, 2002, pp. 326-335, XP-002383768, ISBN: 978-1-58113-513-8, Section 2; Figures 1a, 1b, Section 3. |
Calic Janko et al, “Efficient Key-Frame Extraction and Video Analysis”, IEEE International Conference on Information Technology: Coding and Computing 2002. |
Lowe David G., “Distinctive Image Features From Scale-Invariant Keypoints”, International Journal of Computer Vision, vol. 60, No. 2, 2004. |
Number | Date | Country | |
---|---|---|---|
20120260175 A1 | Oct 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12206319 | Sep 2008 | US |
Child | 13463183 | US |