The disclosed embodiments relate generally to the management and display of annotations for video intervals.
The proliferation of video sharing through video hosting websites provides numerous opportunities to collaborate and experience videos in online communities. Video hosting websites allow users to upload, view, comment on and rate videos. Users browsing a video hosting website can locate videos of interest by, for example, searching for videos, browsing directories, or sorting by ratings.
Comments provide a way to complement video with useful information. Comments can be of various data types, including text, audio, graphics, or other forms. However, comments have been used to provide information about an entire video, rather than a specific portion. If a user wants to direct others to a specific portion of the video, the user has to enter the time offset for that point in the comments, such as “see stunt at 1:48.” Other users would then have to traverse through the subject video to the 1 minute, 48 second mark and then view from there to understand the comment.
In addition, the content contained in comments may be unreliable. Difficulty arises in ascertaining the trustworthiness of the author of the comments. Also, a large number of comments may hinder understanding of the information to be conveyed through such comments. Moreover, it is difficult to know which comments associated with a video are related. For example, unless all of the comments are associated with the identical time-elapsed place in the video, there is uncertainty as to whether the comments refer to the same portion of a video.
Further, users may want to create their own comments to highlight certain aspects of video. Personalized comments may raise security concerns, and challenges in determining how and with whom such bookmarks should be shared. In addition, if personalized comments are examined in isolation, they provide only minimal meaning to related groups of users that also have comments. Such personalized comments are also difficult to retrieve and locate by both the user and those persons with whom the comments have been shared.
The present invention includes systems and methods for managing annotations in videos in a video hosting website. Users submit annotations of intervals within various videos stored in a video hosting website. For example, annotations can be associated with spatial portions of a video frame, with a particular moment in a video, or a scene of a video. For any given video, there may be a large number of annotations, each associated with some interval of video. These intervals may overlap. Thus it is desirable to organize the annotations for one or more intervals of the video into groups, and then determine a clip of the video to associate with the group of annotations. Each group includes annotations for intervals of the video, where the intervals are similar to each other. A group having related annotations is identified, and an annotated clip of the video is formed based upon the intervals in the group. This process can be expanded to determine any number of groups in a given video having related annotations as desired, forming the respective annotated intervals within a single video, and can also identify and organize annotated intervals within a large number of different videos.
A synoptic annotation can be determined based on the related annotations. For instance, a synoptic annotation can include a summary of the content of related annotations or excerpts from the related annotations.
Groups can be formed in a variety of ways. For example, annotations can be clustered into groups based on a timestamp associated with each annotation. Also, annotations can be clustered based on the start times and the end times of the intervals associated with the annotations. Groups can be associated with identified scenes or features in the video. In addition, groups can be formed based on a determination of the maximum number of indications of annotated clips that can be visually distinguished on a timeline corresponding to the video. The amount of time between intervals can also be taken into account in forming the groups.
The content of the annotations can be considered to determine a group having related annotations; for example, a group may be formed from annotations all having one or more keywords in common. Annotations can also be examined to identify annotations containing a search query term in one or many videos.
A computer system manages annotations associated with a video via a number of server modules. An extraction module extracts a plurality of annotations associated with one or more intervals of the video. The grouping module forms a plurality of groups, each including annotations for similar intervals of the video. The annotation determination module determines a group having related annotations, and the annotated clip generation module forms an annotated clip of the video based upon the intervals in the group.
The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims presented herein.
The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
A client 130 executes a browser 132, and connects to the front end server 124 via a network 105, which is typically the Internet, but may also be any network, including but not limited to a LAN, a MAN, a WAN, a mobile, wired or wireless network, a private network, or a virtual private network. While only a single client 130 and browser 132 are shown, it is understood that very large numbers (e.g., millions) of clients are supported and can be in communication with the website 108 at any time. The client 130 may include a variety of different computing devices. Examples of client devices 130 are personal computers, digital assistants, personal digital assistants, cellular phones, mobile phones, smart phones or laptop computers. As will be obvious to one of ordinary skill in the art, the present invention is not limited to the devices listed above.
A user views, authors, and edits annotations using a client 130. An annotation includes annotation content, which is any data which can usefully supplement a media file. For example, annotation content can include an audio or textual commentary, metadata, translation, advertisement or summary, rating on a predetermined scale (1-5 stars), or a command for how the media file should be displayed. An annotation can optionally include a spatial definition, which specifies the area of the frame with which an annotation is associated. An annotation can also include video content. The clients 130 include software and hardware for displaying video.
For example, a client 130 can be implemented as a television, a personal computer, a digital video recorder (DVR), a personal digital assistant (PDA), a cellular telephone, or another device having or connected to a display device; software includes any video player adapted to decode video files, such as MPEG-2, MPEG-4, QuickTime, VCD, or any other current or future video format. Other examples of clients will be apparent to one of skill in the art without departing from the scope of the present invention. Examples of a graphical user interface used by the client 130 according to one embodiment is described herein with references to
In some embodiments, the browser 132 includes an embedded video player 134 such as, for example, the Flash™ player from Adobe Systems, Inc. or any other player adapted for the video file formats used in the video hosting website 108. A user can access a video from the video hosting website 108 by browsing a catalog of videos, conducting searches on keywords, reviewing play lists from other users or the system administrator (e.g., collections of videos forming channels), or viewing videos associated with particular user group (e.g., communities).
Video server 126 receives uploaded media content from content providers and allows content to be viewed by client 130. Content may be uploaded to video server 126 via the Internet from a personal computer, through a cellular network from a telephone or PDA, or by other means for transferring data over network 105 known to those of ordinary skill in the art. Content may be downloaded from video server 126 in a similar manner; in one embodiment media content is provided as a file download to a client 130; in an alternative embodiment, media content is streamed to client 130. The means by which media content is received by video server 126 need not match the means by which it is delivered to client 130. For example, a content provider may upload a video via a browser on a personal computer, whereas client 130 may view that video as a stream sent to a PDA. Note also that video server 126 may itself serve as the content provider.
Users of clients 130 can also search for videos based on keywords, tags or other metadata. These requests are received as queries by the front end server 124 and provided to the video server 126, which is responsible for searching the video database 128 for videos that satisfy the user queries. The video server 126 supports searching on any fielded data for a video, including its title, description, tags, author, category and so forth. Responsive to a request from a client 130 for an annotation associated with a particular media file, the video server 126 sends one or more annotations associated with the media file to the client 130 through the network 105. Responsive to a submission by the client 130 of one or more annotations associated with a media file, the video server 126 stores the one or more annotations in association with the media file in user database 140.
Information about received annotations is stored in the user database 140. The user database 140 is responsible for maintaining a record of all users viewing videos on the website. Each individual user is assigned a user ID. The user ID can be based on any identifying information, such as the user's IP address, user name, or the like. The user database may also contain information about the reputation of the user in both the video context, as well as through other applications, such as the use of email or text messaging.
Users of the clients 130 and browser 132 can upload content to the video hosting website 108 via network 105. The uploaded content can include, for example, video, audio or a combination of video and audio. The uploaded content is processed and stored in the video database 128. This processing can include format conversion (transcoding), compression, metadata tagging, and other data processing. An uploaded content file is associated with the uploading user, and so the user's account record is updated in the user database 140 as needed.
For purposes of convenience and the description of one embodiment, the uploaded content will be referred to a “videos”, “video files”, or “video items”, but no limitation on the types of content that can be uploaded are intended by this terminology. Each uploaded video is assigned a video identifier when it is processed.
The video database 128 is used to store the received videos. The video database 128 stores video content and associated metadata, provided by their respective content owners. The video files have metadata associated with each file such as a video ID, artist, video title, label, genre, and time length.
A video access log 129 within video database 128 stores each instance of video access. Annotations can be submitted by clicking on an indicator or on a portion of a time line associated with the video. Users may also click and drag on the time line to specify an annotation for a longer interval of video. Users may also submit annotations via a digital video recorder (DVR) or with a device providing similar functionality, such as by using a remote control configured to allow entry of annotations through a user interface associated with the device. Each entry in the access log 129 identifies a video being accessed, a time of access, an IP address of the user, a user ID if available, cookies, search queries, data identifying the type of interaction with the video, and the time of every interaction with the video. Interaction types can include any user interactions in the user interface of the website 108, such as playing, pausing, rewinding, forwarding and submitting annotations or ratings for a video.
Turning now to
Users can submit an annotation for an interval of video in various ways. For example, users can click the “B” button 306 when they view an interval of video on which they wish to provide an annotation. Users can also click and hold the “B” button 306 to indicate an interval longer than one click. As another option, users can click the “B” button 306 to mark the start time of an interval and click the “B” button 306 again to indicate the end of an interval for which they are providing an annotation. The depiction in
Another example of a user interface for receiving annotations for video intervals is depicted in
The modules 110-120 of
The grouping module 112 forms 220 groups containing annotations for similar intervals of the video. The groups of annotations within a block of annotations may be formed by a variety of methods. For example, the grouping module 112 forms a plurality of groups by clustering annotations based on a timestamp associated with each annotation. If the annotations have timestamps within a specified time limit of each other, the grouping module determines that they relate to similar intervals of video and are clustered. For example, annotations that have a timestamp within 5 seconds of each other are determined to refer to similar intervals. For example, some users may timestamp an action at its beginning, some at its climax and some immediately after it finishes. Using this technique, these annotations would be grouped together.
In another embodiment, the annotations are clustered based on the start times and the end times of the intervals. For example, if there is sufficient overlap (e.g. 25%) between intervals (bounded by the start times and the end times of the intervals with which the annotations are associated), the grouping module 112 determines that the annotations relate to similar intervals of the video. This allows for annotations to be grouped even where the intervals to which they are associated are not identical.
Various methods for clustering the annotations may be used. Some examples of well-known clustering methods include k-means or k-center clustering.
In another embodiment, the grouping module 112 forms 220 groups by determining an amount of time between the intervals with which the annotations are associated. If a sufficient amount of time exists between intervals (for example, 30 seconds), grouping module 112 forms a new group for the annotations associated with the intervals. For instance, if only 5 seconds exists between two intervals in question in a 10 minute video, grouping module could decline to form new group for the intervals.
Another way grouping module 112 forms 220 groups is by identifying a plurality of scenes or features in a video and associating each group with one or more scenes or features.
A feature is a succinct representation of the content of one or more frames of video that are similar. For example, the grouping module 112 may group the frames into logical units, such as scenes or shots. The grouping module 112 may use scene detection algorithms to group the frames automatically. One scene detection algorithm is described in Naphade, M. R., et al., “A High-Performance Shot Boundary Detection Algorithm Using Multiple Cues”, 1998 International Conference on Image Processing (Oct. 4-7, 1998), vol. 1, pp. 884-887, which is incorporated by reference herein, though there are many scene detection algorithms known in the art that can be used equally as well.
Thus, the grouping module 112 can compute one feature set for all frames that belong to the same scene. The feature can be, for example, a description of a characteristic in the time, spatial, or frequency domains. For example, annotations can be associated with a specific frame, and can describe that frame by its time, position, and frequency domain characteristics. The grouping module 112 can use any technique for determining features of video, such as those described in Zabih, R., Miller, J., and Mai, K., “Feature-Based Algorithms for Detecting and Classifying Scene Breaks”, Proc. ACM Multimedia 95, San Francisco, Calif. (November 1993), pp. 189-200; Arman, F., Hsu, A., and Chiu, M-Y., “Image Processing on Encoded Video Sequences”, Multimedia Systems (1994), vol. 1, no. 5, pp. 211-219; Ford, R. M., et al., “Metrics for Shot Boundary Detection in Digital Video Sequences”, Multimedia Systems (2000), vol. 8, pp. 37-46, all of the foregoing being incorporated by reference herein. One of ordinary skill in the art would recognize various techniques for determining features of video.
In another embodiment, the grouping module 112 forms 220 groups by determining a maximum number of indications of annotated clips that can be visually distinguished on a timeline corresponding to the video. For example, a long video may have a large number of annotations associated with a variety of intervals. Indications of annotated clips might be difficult to visually distinguish on the timeline due to limitation on the image size and resolution. In some circumstances, more groups may be needed for a longer video than for a shorter video. In light of the difficulty of visually distinguishing large numbers of indications of annotated clips on a timeline, grouping module 112 can set a maximum amount of groups that it will form based on this visual indication. Thus, even though there may be more than, for example, 10 annotated clips, grouping module 112 may limit the indications displayed to the 10 most annotated clips in a given video. In addition, grouping module 112 can also limit an action-packed short video to a maximum number of annotated clips to ease visual distinction of the indications on a time line as well.
For a given video, the annotation determination module 114 determines 230 a group having related annotations in a variety of ways. One of ordinary skill will recognize that the grouping of annotations can be executed using various information retrieval techniques, such as stemming, expansion with related words, vector analysis, and sub-string similarity, as well as natural language process/computational linguistic methods. For example, annotation determination module 114 determines the first group of related annotations based at least in part on a comparison of the content within each annotation. Thus, the same or similar words within different annotations can be used to determine that the annotations are related within a group (e.g. annotations with the words “New York City” and “New York” would be related because they contain the same first eight characters).
In another example, annotation determination module 114 assigns a weight to each annotation based on whether the annotation was provided by a unique user and determines the group based on the assigned weights of the annotations. Thus, the group may be determined to have related annotations based on the weight assigned to each annotation (e.g. annotations submitted by the same user have a lesser weight, and are therefore considered less likely to be related for the determination of a group).
Annotation determination module 114 may also assign a weight to each annotation based on the reputation score associated with the annotation. An annotation may be associated with a reputation score, for example, based on whether the annotation was submitted by a new or unrecognized user, the usefulness of the annotations previously submitted by the user, the number of annotations by the user that are approved by others, or other information about the user within user database 140.
The clip generation module 116 is adapted to form 240 a clip of a video based on the intervals in a given group. There are various ways to form clips. In one embodiment, the clip generation module 116 examines only the start time of the intervals that have received annotations. Thus, all of the considered intervals will start at some time instant within the video and continue to the end of the video. Alternatively, clip generation module 116 may consider both start and end times for intervals that have received annotations. Clip generation module 116 can then use these times to determine the earliest (or latest) point of any interval in the group, optionally may round these times to the start of the scene just before (or after) it.
In another embodiment, clip generation module 116 projects the contribution of each of the intervals in the group on a time line, such as by adding the weight of the annotations for each interval, or the logarithm of the number of annotations for each time instant. The clip generation module 116 then fits a probabilistic model to the distribution (e.g., Gaussian distribution) by standard statistical methods, and then selects the mean as the center. The clip generation module 116 can then select a certain number (e.g., three) of standard deviations to either side of the mean, providing that the start and end times are rounded to the scene boundaries.
In another embodiment, the clip generation module 116 examines traffic traces to determine how much each instant of the video was watched by different users, which may include those who have not submitted any annotations. This information can also be used in conjunction with the above methods to determine where a clip should begin and end. In one embodiment, the instances of video that have received the most traffic within a given time period are considered to be part of the same clip and are used in determining the length of the clip.
For example, assume a 3:00 minute video depicts an actor on a motorcycle performing a stunt, such as jumping over a shark in the water, with the apex of the jump taking place at 2:00. One annotation might be for an interval from two seconds prior to and two seconds after the jump (e.g. 1:58-2:02); another annotation might be for the apex of the jump (e.g. 2:00); a third annotation might be for the interval lasting from before the jump until after the motorcycle has safely landed (e.g. 1:50-2:10). Based on these intervals, clip generation module 116 forms the annotated clip (e.g. 1:54-2:06). In this example, clip generation module 116 forms the annotated clip by averaging the time within the intervals associated with the three annotations.
Synoptic annotation module 117 forms a synoptic annotation for the first annotated clip of video based on the related annotations in the first group. In one embodiment, synoptic annotation module 117 creates a synoptic annotation by summarizing the content of the related annotations in the first group. One of ordinary skill will recognize that a summary of annotations can be executed using various techniques such as concatenating the annotations or using “snippet” generation methods, as in web search interfaces. Another technique for summarizing annotations is using string similarity, such as various edit distances between strings to determine the 1-center (the annotation that has the minimum of the maximum distance to all the other annotations). In another embodiment, a summary annotation could be created based on common subsequence analysis (as in Computational Biology where genomic sequences are analyzed).
In one example, synoptic annotation module 117 creates a synoptic annotation by providing excerpts of the related annotations. As an example, suppose three annotations are submitted: (1) “Fonzie jumps the shark” (2) “Fonz takes off” and (3) “Shorts and a leather jacket: that is jumping the shark.” A synoptic annotation that summarizes the content of the three annotations might read: “Fonz, in shorts and a leather jacket, takes off and jumps the shark.” A synoptic annotation that excerpts the related annotations might read: “Fonzie jumps the shark . . . takes off . . . shorts and a leather jacket.”
The ranking module 120 ranks annotated clips based on the number of annotations in each group. The ranking module 120 also ranks annotated clips across multiple videos. As can be appreciated, the various modules can determine the number of annotations for each clip in any number of videos to identify the most annotated clips overall.
The ranking module 120 can be used in conjunction with video searching as well, such that videos that are determined to be responsive to a search query can be ranked based on the annotations for groups for each responsive video. In one embodiment, the ranking module 120 determines the rank of the videos based on the number of annotations for the most annotated interval in each video (e.g., the highest ranked video would be the video containing a clip that received the highest number of annotations). In another embodiment, the ranking module 120 determines the rank of the videos based on the total number of annotations received for all groups within each video (e.g. the highest ranked video would be the video that received the most annotations across all clips within that video).
The display module 118 provides for the display of an indication of annotated clips on a timeline associated with the videos. This allows a user to efficiently understand and access the annotated clips in one or more videos. The display module 118 can also create an annotated highlights segment or trailer of a given video by forming an annotated excerpt of a video that includes a first annotated clip and a second annotated clip and displaying the annotated excerpt.
The video database 128 stores lists of the videos with annotated clips. The lists may be grouped by genre, rating, or any other property. The lists of related videos are updated hourly, in one example, by performing an analysis of annotation activity from user database 140. Once the lists of the most annotated clips have been generated, the video server 126 extracts the videos from the video database 128 based on the lists, as well as the annotations from user database 140, and provides the annotated videos to users for viewing.
In
Referring now to
For the purposes of illustration, this discussion refers to a video as being composed of frames. Video is sometimes stored or transmitted as blocks of frames, fields, macroblocks, or in sections of incomplete frames. When reference is made herein to video being composed of frames, it should be understood that during intermediate steps video may in fact be stored as any one of various other forms. The term “frame” is used herein for the sake of clarity, and is not limiting to any particular format or convention for the storage or display of video.
Some of the frames have annotations associated with them as provided by a particular user. In the example illustrated, frame 601 is drawn in greater to detail to illustrate some of its associated annotations. As shown in the figure, annotations can be associated with a particular spatial location of a frame, or they can be associated with an entire frame. For example, annotation 1 is associated with a rectangular box in the upper-left corner of frame 601. In contrast, annotation 4 is associated with the entire frame.
Annotations can also be associated with overlapping spatial locations. For example, annotation 1 is associated with a rectangular box overlapping a different rectangular box associated with annotation 2. In one embodiment, annotations can be associated with a spatial location defined by any closed form shape. For example, as shown in
Annotation list 680 maintains associations between the spatial definition of annotations and the content of annotations Annotation 1, associated with a rectangular box in frame 601, includes the text “Vice President.” Annotation 1 is an example of an annotation useful for highlighting or adding supplemental information to particular portions of a frame. Annotation 4 is associated with the entire frame 601 and contains the text “State of the Union.” Annotation 4 is an example of an annotation used to summarize the content of a frame. Annotation 5 is associated with the entire frame 601 and contains some audio, which, in this case, is a French audio translation Annotation 5 is an example of an annotation used to provide supplemental audio content.
Annotations can also have temporal associations with a media file or any portion thereof. For example, an annotation can be associated with a specific frame, or a specific range of frames. In
During playback of a media file, the client 130 is adapted to display the annotations associated with the frames of the file Annotations can be displayed, for example, as text superimposed on the video frame, as graphics shown alongside the frame, or as audio reproduced simultaneously with video; annotations may also appear in a separate window or frame proximate to the video. Annotations can also include commands for how the media file with which they are associated is to be displayed. Displaying command annotations can include displaying video as instructed by the annotation. For example, responsive to an annotation, the client 130 might skip to a different place in a video, display a portion of the video in slow motion, or jump to a different video altogether.
The client 130 is capable of displaying a subset of the available annotations. For example, a user watching the video of
Users can also search for annotations, and retrieve associated video based on the results of the annotation search.
Certain annotations can be given a priority that does not allow a user to prevent them from being displayed. For example, annotations can include advertisements, which may be configured so that no other annotations are displayed unless the advertisement annotations are also displayed. Such a configuration would prevent users from viewing certain annotations while avoiding paid advertisement annotations. In addition, certain annotations could be provided by the content provider, such as “tags” that contain brief snippets of content to facilitate navigation of the video. The distinction of a content provider's “tag” could indicate to the user that the annotation is from a reputable source.
A method for determining which annotations to display is described herein with reference to
Optionally, the client receives changes to the annotation from the user. For example, a user can edit text, re-record audio, modify metadata included in the annotation content, or change an annotation command. The client transmits the modified annotation to the video server, or, alternatively, transmits a description of the modifications the video server. The video server receives the modified annotation, which is stored in the user database.
For example, a user viewing the annotations shown in
The annotation list 680 is shown in
As also described herein with reference to
Referring now to
The video player graphical user interface 702 presents a frame of video. Shown along with the frame of video is an annotation definition 704. The annotation definition 704 graphically illustrates the spatial definition and/or the temporal definition of an annotation. For example, the annotation definition 704 shown in
The annotation definition 704 can be displayed in response to a user selection, or as part of the display of an existing annotation. For example, the user can use an input device to select a region of the frame with which a new annotation will be associated, and in response to that selection the video player graphical user interface 702 displays the annotation definition 704 created by the user. As another example, the video player graphical user interface 702 can display video and associated annotations, and can display the annotation definition 704 in conjunction with displaying an associated annotation.
The video player graphical user interface 702 also includes annotation control buttons 706, which allow the user to control the content and display of annotations. For example, the video player graphical user interface 702 can include a button for searching annotations. In response to the selection of the search annotations button, the client searches for annotations associated with the annotation definition 704 (or a similar definition), or for annotations associated with a keyword. The results of the search can then be displayed on the video player graphical user interface 702. As another example, the video player graphical user interface 702 can include a button for editing annotations. In response to the selection of the edit annotations button, the video player graphical user interface 702 displays one or more annotations associated with the annotation definition 704 and allows the user to modify the one or more annotations. As yet another example, the video player graphical user interface 702 can include a button for creating a new annotation. In response to the selection of the create new annotation button, the video player graphical user interface 702 displays options such as those shown in
Referring now to
The entering of new annotation text 708 has been shown as an example of the authoring of annotation content. The video player graphical user interface 702 can be adapted to receive other types of annotation content as well. For example, annotation content can include audio, and the video player graphical user interface 702 can include a button for starting recording of audio through a microphone, or for selecting an audio file from a location on a storage medium. Other types of annotations and similar methods for receiving their submission by a user will be apparent to one of skill in the art without departing from the scope of the invention.
Turning now to
In another embodiment, users can access annotated clips for videos using a DVR or a device providing similar functionality. By using a remote control or a viewing default, users of a device can access annotated clips within a single video and across multiple videos. This would allow users to view the highlights of a given video or set of videos (such as the sports highlights for a given time period).
Turning now to
The ranked list of annotated clips for all videos is stored in the video database 128. The ranked list of video clips is updated on an hourly basis, according to one embodiment. This ensures the most up to date relevant videos is presented to users. The ranked list may also be updated on a daily basis. The time in which the ranked list is updated is merely illustrative of the times in which an appropriate update can take place and the update can occur at any suitable time set by the administrator of the video hosting website 108.
The client 130 receives 1002 an annotation. The client determines 1004 if the annotation is high-priority. A high-priority annotation is displayed regardless of user settings for the display of annotations. High-priority annotations can include, for example, advertisements, emergency broadcast messages, or other communications whose importance that should supersede local user settings.
If the client 130 determines 1004 that the annotation is high-priority, the client displays 1012 the annotation. If the client 130 determines 1004 that the annotation is not high-priority, the client determines 1006 if annotations are enabled Annotations can be enabled or disabled, for example, by a user selection of an annotation display mode. If the user has selected to disable annotations, the client 130 does not display 1010 the annotation. If the user has selected to enable annotations, the client 130 determines 1008 if the annotation matches user-defined criteria.
As described herein, the client 130 allows the user to select annotations for display based on various criteria. In one embodiment, the user-defined criteria can be described in the request for annotation, limiting the annotations sent by the video server 126. In another embodiment, the user-defined criteria can be used to limit which annotations to display once annotations have been received at the client 130. User defined-criteria can specify which annotations to display, for example, on the basis of language, annotation content, particular authors or groups of authors, or other annotation properties.
If the client 130 determines 1008 that the annotation satisfies the user-defined criteria, the client 130 displays 1012 the annotation. If the client 130 determines 1008 that the annotation does not satisfy the user-defined criteria, the client 130 does not display 1010 the annotation.
The case of a video server and a client is but one example in which the present invention may be usefully employed for the management of annotations for video. It will be apparent to one of skill in the art that the methods described herein will have a variety of other uses without departing from the scope of the present invention. For example, the features described herein could be used in an online community in which users can author, edit, review, publish, and view annotations collaboratively. Such a community would allow for open-source style production of annotations without infringing the copyright protections of the video with which those annotations are associated.
As an added feature, a user in such a community could also accumulate a reputation, for example based on other users' review of the quality of that user's previous authoring or editing. A user who wants to view annotations could have the option of ignoring annotations from users with reputations below a certain threshold, or to search for annotations by users with reputations of an exceedingly high caliber. As another example, a user could select to view annotations only from a specific user, or from a specific group of users.
As described herein, annotations can also include commands describing how video should be displayed, for example, commands that instruct a display device to skip forward in that video, or to jump to another video entirely. A user could author a string of jump-to command annotations, effectively providing a suggestion for the combination of video segments into a larger piece. As an example, command annotations can be used to create a new movie from component parts of one or more other movies.
The present invention has applicability to any of a variety of hosting models, including but not limited to peer-to-peer, distributed hosting, wiki-style hosting, centralized serving, or other known methods for sharing data over a network.
The annotation framework described herein presents the opportunity for a plurality of revenue models. As an example, the owner of the video server 126 can charge a fee for including advertisements in annotations. The video server 126 can target advertisement annotations to the user based on a variety of factors. For example, the video server 126 could select advertisements for transmission to the client based on the title or category of the video that the client is displaying, known facts about the user, recent annotation search requests (such as keyword searches), other annotations previously submitted for the video, the geographic location of the client, or other criteria useful for effectively targeting advertising.
Access to annotations could be provided on a subscription basis, or annotations could be sold in a package with the video content itself. For example, a user who purchases a video from an online video store might be given permission for viewing, editing, or authoring annotations, either associated with that video or with other videos. An online video store might have a promotion, for example, in which the purchase of a certain number of videos in a month gives the user privileges on a video server 126 for that month.
These examples of revenue models have been given for the purposes of illustration and are not limiting. Other applications and potentially profitable uses will be apparent to one of skill in the art without departing from the scope of the present invention.
In addition, methods of spam control would help ensure the security of the sharing of the annotations.
Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
It should be noted that the process steps and instructions of the present invention can be embodied in software, firmware or hardware, and when embodied in software, can be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references below to specific languages are provided for disclosure of enablement and best mode of the present invention.
While the invention has been particularly shown and described with reference to a preferred embodiment and several alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention.
Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
This application is a continuation of U.S. patent application Ser. No. 12/033,817 filed on Feb. 19, 2008, which is incorporated by reference herein in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5339393 | Duffy et al. | Aug 1994 | A |
5388197 | Rayner | Feb 1995 | A |
5414806 | Richards | May 1995 | A |
5465353 | Hull et al. | Nov 1995 | A |
5530861 | Diamant et al. | Jun 1996 | A |
5600775 | King et al. | Feb 1997 | A |
5664216 | Blumenau | Sep 1997 | A |
5708845 | Wistendahl et al. | Jan 1998 | A |
5732184 | Chao et al. | Mar 1998 | A |
5966121 | Hubbell et al. | Oct 1999 | A |
6006241 | Purnaveja et al. | Dec 1999 | A |
6144375 | Jain | Nov 2000 | A |
6415438 | Blackketter et al. | Jul 2002 | B1 |
6570587 | Efrat et al. | May 2003 | B1 |
6774908 | Bates et al. | Aug 2004 | B2 |
6956573 | Bergen | Oct 2005 | B1 |
6956593 | Gupta et al. | Oct 2005 | B1 |
7055168 | Errico et al. | May 2006 | B1 |
7080139 | Briggs | Jul 2006 | B1 |
7131059 | Obrador | Oct 2006 | B2 |
7243301 | Bargeron et al. | Jul 2007 | B2 |
7383497 | Glenner et al. | Jun 2008 | B2 |
7418656 | Petersen | Aug 2008 | B1 |
7559017 | Datar et al. | Jul 2009 | B2 |
7616946 | Park et al. | Nov 2009 | B2 |
7636883 | Albornoz et al. | Dec 2009 | B2 |
7644364 | Patten et al. | Jan 2010 | B2 |
7724277 | Shingu et al. | May 2010 | B2 |
7945653 | Zuckerberg et al. | May 2011 | B2 |
8132200 | Karam | Mar 2012 | B1 |
8151182 | Datar et al. | Apr 2012 | B2 |
8181201 | Goldenberg et al. | May 2012 | B2 |
8202167 | Ackley et al. | Jun 2012 | B2 |
8209223 | Fink et al. | Jun 2012 | B2 |
8280827 | Muller et al. | Oct 2012 | B2 |
8392834 | Obrador | Mar 2013 | B2 |
8713618 | Kuznetsov | Apr 2014 | B1 |
8839327 | Amento et al. | Sep 2014 | B2 |
20010023436 | Srinivasan et al. | Sep 2001 | A1 |
20020049983 | Bove, Jr. et al. | Apr 2002 | A1 |
20020059342 | Gupta et al. | May 2002 | A1 |
20020059584 | Ferman et al. | May 2002 | A1 |
20020069218 | Sull et al. | Jun 2002 | A1 |
20020078092 | Kim et al. | Jun 2002 | A1 |
20020078446 | Dakss et al. | Jun 2002 | A1 |
20020188630 | Davis | Dec 2002 | A1 |
20030002851 | Hsiao et al. | Jan 2003 | A1 |
20030018668 | Britton et al. | Jan 2003 | A1 |
20030020743 | Barbieri | Jan 2003 | A1 |
20030039469 | Kim | Feb 2003 | A1 |
20030068046 | Lindqvist et al. | Apr 2003 | A1 |
20030093790 | Logan et al. | May 2003 | A1 |
20030095720 | Chiu et al. | May 2003 | A1 |
20030112276 | Lau et al. | Jun 2003 | A1 |
20030177503 | Sull et al. | Sep 2003 | A1 |
20030196164 | Gupta | Oct 2003 | A1 |
20030196964 | Koslow | Oct 2003 | A1 |
20040021685 | Denoue | Feb 2004 | A1 |
20040125148 | Pea et al. | Jul 2004 | A1 |
20040128308 | Obrador | Jul 2004 | A1 |
20040168118 | Wong et al. | Aug 2004 | A1 |
20040172593 | Wong et al. | Sep 2004 | A1 |
20040181545 | Deng et al. | Sep 2004 | A1 |
20040205547 | Feldt et al. | Oct 2004 | A1 |
20040210602 | Hillis et al. | Oct 2004 | A1 |
20050132401 | Boccon-Gibod et al. | Jun 2005 | A1 |
20050160113 | Sipusic et al. | Jul 2005 | A1 |
20050203876 | Cragun et al. | Sep 2005 | A1 |
20050203892 | Wesley et al. | Sep 2005 | A1 |
20050207622 | Haupt | Sep 2005 | A1 |
20050275716 | Shingu et al. | Dec 2005 | A1 |
20050289142 | Adams | Dec 2005 | A1 |
20050289469 | Chandler et al. | Dec 2005 | A1 |
20060041564 | Jain et al. | Feb 2006 | A1 |
20060053365 | Hollander et al. | Mar 2006 | A1 |
20060059120 | Xiong et al. | Mar 2006 | A1 |
20060064733 | Norton et al. | Mar 2006 | A1 |
20060087987 | Witt et al. | Apr 2006 | A1 |
20060101328 | Albornoz et al. | May 2006 | A1 |
20060161578 | Siegel et al. | Jul 2006 | A1 |
20060161838 | Nydam et al. | Jul 2006 | A1 |
20060218590 | White | Sep 2006 | A1 |
20060286536 | Mohler et al. | Dec 2006 | A1 |
20060294134 | Berkhim et al. | Dec 2006 | A1 |
20070038610 | Omoigui | Feb 2007 | A1 |
20070099684 | Butterworth | May 2007 | A1 |
20070101387 | Hua et al. | May 2007 | A1 |
20070121144 | Kato | May 2007 | A1 |
20070174774 | Lerman et al. | Jul 2007 | A1 |
20070250901 | McIntire et al. | Oct 2007 | A1 |
20070256016 | Bedingfield, Sr. | Nov 2007 | A1 |
20070266304 | Fletcher et al. | Nov 2007 | A1 |
20070271331 | Muth | Nov 2007 | A1 |
20080005064 | Sarukkai | Jan 2008 | A1 |
20080028323 | Rosen et al. | Jan 2008 | A1 |
20080086742 | Aldrey et al. | Apr 2008 | A1 |
20080091723 | Zuckerberg et al. | Apr 2008 | A1 |
20080092168 | Logan et al. | Apr 2008 | A1 |
20080109841 | Heather et al. | May 2008 | A1 |
20080168055 | Rinearson et al. | Jul 2008 | A1 |
20080168070 | Naphade et al. | Jul 2008 | A1 |
20080168073 | Siegel et al. | Jul 2008 | A1 |
20080195657 | Naaman et al. | Aug 2008 | A1 |
20090007200 | Amento et al. | Jan 2009 | A1 |
20090076843 | Graff | Mar 2009 | A1 |
20090087161 | Roberts et al. | Apr 2009 | A1 |
20090094520 | Kulas | Apr 2009 | A1 |
20090150947 | Soderstrom | Jun 2009 | A1 |
20090172745 | Horozov et al. | Jul 2009 | A1 |
20090210779 | Badoiu et al. | Aug 2009 | A1 |
20090249185 | Datar et al. | Oct 2009 | A1 |
20090276805 | Andrews, II et al. | Nov 2009 | A1 |
20100287236 | Amento et al. | Nov 2010 | A1 |
20130238995 | Polack | Sep 2013 | A1 |
20130263002 | Park | Oct 2013 | A1 |
20130290996 | Davis | Oct 2013 | A1 |
20140101691 | Sinha et al. | Apr 2014 | A1 |
Number | Date | Country |
---|---|---|
2003-283981 | Oct 2003 | JP |
2004-080769 | Mar 2004 | JP |
2005-530296 | Oct 2005 | JP |
2006-157689 | Jun 2006 | JP |
2006-157692 | Jun 2006 | JP |
2007-142750 | Jun 2007 | JP |
2007-151057 | Jun 2007 | JP |
2007-274090 | Oct 2007 | JP |
2007529822 | Oct 2007 | JP |
2007-310833 | Nov 2007 | JP |
2007317123 | Dec 2007 | JP |
WO 03019418 | Mar 2003 | WO |
WO 2007082169 | Jul 2007 | WO |
WO 2007135688 | Nov 2007 | WO |
Entry |
---|
Chinese First Office Action, Chinese Application No. 200910206036.4, Sep. 18, 2012, 19 pages. |
European Extended Search Report, European Application No. 09709327.2, Sep. 21, 2012, 7 pages. |
Office Action for Canadian Patent Application No. CA 2,726,777, Nov. 26, 2012, 3 Pages. |
Extended European Search Report for European Patent Application No. EP 09711777.4, Dec. 12, 2012, 8 Pages. |
Miyamori, H., et al., “Generation of Views of TV Content Using TV Viewers' Perspectives Expressed in Live Chats on the Web,” Proceedings of the 13th Annual ACM International Conference on Multimedia, Nov. 6, 2005, pp. 853-861. |
Supplementary European Search Report for European Patent Application No. EP 07865849.9, May 18, 2012, 7 Pages. |
First Office Action for Chinese Patent Application No. CN 200980108230.7, Feb. 28, 2012, 11 Pages. |
Rejection Decision for Chinese Patent Application No. CN 200780050525.4, Jan. 18, 2012, 23 Pages. |
Caspi, Y., et al., “Sharing Video Annotations,” International Conference on Image Processing, 2004, pp. 2227-2230. |
Gordon, A., “Using Annotated Video as an Information Retrieval Interface,” IUI ACM, 2000, pp. 133-140. |
Chinese Second Office Action, Chinese Application No. 200980108230.7, Aug. 13, 2012, 11 pages. |
Office Action for Canadian Patent Application No. CA-2,672,757, Mar. 21, 2013, 5 Pages. |
Office Action for Japanese Patent Application No. P2010-546967, Apr. 23, 2013, 5 Pages. |
Notification of Reexamination Board Opinion for Chinese Patent Application No. 200780050525.4, Jun. 14, 2013, 11 Pages. |
First Examination Report for Indian Patent Application No. 1191/MUMNP/2009, Mar. 18, 2014, 2 Pages. |
Communication Pursuant to Article 94(3) EPC for European Patent Application No. EP 09709327.2, Jan. 10, 2014, 6 Pages. |
Notification of Second Board Opinion for Chinese Patent Application No. 200780050525.4, Dec. 26, 2013, 5 Pages. |
Schroeter, R., et al., “Vannotea—A Collaborative Video Indexing, Annotation and Discussion System for Broadband Networks,” Knowledge Capture, 2006, pp. 1-8. |
Europe Summons to attend oral proceedings pursuant to Rule 115(1) EPC for EP Application No. 09709327.2, Sep. 25, 2014, 9 pages. |
China 2nd Office Action for CN Application No. 200780050525.4, Sep. 16, 2014, 21 pages. |
Office Action for KR Application No. 10-2010-7020965, Feb. 26, 2015, 11 pages, (with concise explanation of relevance). |
3rd Office Action for Chinese Patent Application No. CN 200780050525.4, Apr. 30, 2015, 8 Pages. |
Communication pursuant to Article 94(3) EPC for European Patent Application No. EP07865849.9, Apr. 28, 2015, 6 pages. |
Extended European Search Report for European Patent Application No. EP09758919.6, Feb. 10, 2015, 7 Pages. |
Japanese Office Action for JP Application No. P2014-094684, Mar. 17, 2015, 9 Pages. |
Office Action for U.S. Appl. No. 14/145,786, Apr. 1, 2016, 24 Pages. |
Summons to attend oral proceedings pursuant to Rule 115(1) EPC for European Patent Application No. EP 07865849.9, Apr. 13, 2016, 6 Pages. |
Decision of Rejection for Japanese Patent Application No. JP 2014-094684, Feb. 1, 2016, 7 Pages. |
Communication pursuant to Article 91(3) EPC for European Patent Application No. 09711777.4, Aug. 18, 2015, 8 Pages. |
4th Office Action for Chinese Patent Application No. CN 200780050525.4, Oct. 26, 2015, 6 Pages. |
Office Action for Canadian Patent Application No. CA 2,866,548, Apr. 21, 2016, 3 Pages. |
May, M., “Computer Vision: What is the difference between local descriptors and global descriptors?” Quora, Mar. 31, 2013,1 page, can be retrieved at <URL:https://www.quora.com/Computer-Vision-What-is-the-difference-between-local-descriptors-and-global-descriptors>. |
Number | Date | Country | |
---|---|---|---|
20120102387 A1 | Apr 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12033817 | Feb 2008 | US |
Child | 13341131 | US |