The present disclosure relates to a video classification and search system to support customizable video highlights.
The proliferation of media data captured by audio-visual devices in daily life has become immense, which leads to significant problems in the management and review of such data. Individuals often capture so many videos in their daily lives that it can become too burdensome to edit those videos so that later review is meaningful. And, while some devices attempt to classify videos at a coarse level, prior techniques typically assign quality scores monolithically to videos. For example, a video may be classified as “good” without further granularity. If a video that contains content reflecting several potentially desirable content elements (e.g., video that contains content representing several family members and a pet), designating a video as “good” may not be appropriate for all possible uses.
Embodiments of the present disclosure overcome disadvantages of the prior art by providing a video classification, indexing, and retrieval system that classifies and retrieves video along multiple indexing dimensions. A search system may field queries identifying desired parameters of video, search an indexed database for videos that match the query parameters, and create clips extracted from responsive videos that are provided in response. In this manner, different queries may cause different clips to be created from a single video, each clip tailored to the parameters of the query that is received.
The training system 110 may include an analytics unit 112 and storage 114. When new videos are presented to the system 100, the analytics unit 112 may analyze and/or classify the video according to predetermined classifications. For example, the analytics unit may analyze video for purposes of:
The analytics unit 112 may generate metadata to be stored 114 with the video identifying, with respect to a temporal axis of the video, the results of the different analyses. The metadata may be represented as text, scores, or feature vectors that form a basis of search. In an embodiment, machine learning algorithms may be applied to perform the respective detections and classifications. Machine learning algorithms often generate results that have fuzzy outcomes; in such cases, the detections and classification metadata may include score values representing degrees of confidence respectively for the detections and classifications so made.
Stored video metadata also may include playback properties of the video, including, for example, the video's duration, playback window size, orientation (e.g., whether it is in portrait or landscape mode), the playback speed, camera motion during video capture, and (if provided) an indicator whether the video is looped. These playback properties may be provided with the video as it is imported into the system 100 or, alternatively, may be developed by the analytics unit 112.
Stored video metadata also may include metadata developed via user interaction 140 with stored video. For example, users may assign “likes” or other ratings to stored video. Users may edit stored videos or export them to applications (not shown) within the system 100, which may indicate that a user prefers the videos interacted with to other stored videos with which the user has not yet interacted. Users may build new media assets from stored videos by integrating them with other media assets (e.g., combining recorded video with a music asset), in which case classification information relating to the other media asset(s) (the music) may be associated with the stored video. And, of course, users may tag video with identifiers of people, pets, and other objects through direct interaction 140. In an embodiment, the analytics unit 112 may generate user importance scores from such user interaction 140.
As a result of the output of the analytics unit 112, the playback properties, and/or the user interaction, stored video may have a multidimensional array of classification metadata stored therewith. The metadata may be integrated into a search index and thereby provide the basis for searches by the search system 120.
The search system 120 may receive a query from an external requestor 130, perform a search among the videos in storage 114, and return a response that provides responsive videos. Search queries may contain parameter(s) that identify characteristics of desired videos. In one embodiment, the search system 120 may provide clips extracted from responsive videos that are responsive to query parameters, which may cause different clips from a single video to be served in response to different queries.
The system 100 may receive queries from other elements of an integrated computer system (not shown). In one embodiment, the system 100 may be provided as a service within an operating system of a computer device and it may field queries from other elements of the operating system. In another embodiment, the system 100 may field queries from an application that executes on a computer device. In yet a further application, the system 100 may be disposed on a first computer system (for example, a media server) and it may field queries from a separate computer system (a media client) over a communication network (not shown).
The example of
The example of
Application of the system 100 of
Exemplary applications of the system 100 are presented below.
As an example, the system 100 may be applied in a device 100 (
In this example, search queries may be applied that search by person and action type (e.g., “dad” AND “skiing” or “cat” AND “jumping”). The search system 120 may return a response that includes clips extracted from stored videos that are tagged by metadata associated with the person and action type requested. A requestor 130 may further process the clips for presentation on the device 100 as desired. For example, the clips may be concatenated into a larger video presentation and (optionally) accompanied by an audio presentation selected by the requestor 130.
In another example, again, the system 100 may be applied in a device 100 (
In this example, search queries may be applied that search by person and a desired duration (e.g., “dad” AND 25 seconds). The search system 120 may return a response that includes clips extracted from stored videos that are tagged by metadata associated with the person and meet the desired duration parameter within a tolerance threshold. A requestor 130 may further process the clips for presentation on the device 100, as desired.
This example may find application where extracted clips are to be concatenated into a larger video presentation and time-aligned with an audio presentation selected by the requestor 130. The audio presentation may have different temporal intervals of significance (example: a song in which verses last for 45 seconds, choruses last for 25 seconds, etc.). The requestor 130 may issue queries for desired content that identify the durations of the audio intervals to which clips are to be aligned. When responsive clips are provided by the search system 120, the requestor 130 may compile a concatenated video by aligning, with the verses, the clips whose durations coincide with the verses' duration and by aligning, with the choruses, the clips whose durations coincided with the choruses' duration.
In yet another example, again, the system 100 may be applied in a device 100 (
In this example, search queries may be applied that search by event, a desired duration and a classification of motion flow (e.g., “wedding”+25 seconds+highly active). The search system 120 may return a response that includes clips extracted from stored videos that are tagged by metadata classifying the video as a wedding, the desired duration within a tolerance threshold, and the requested level of motion flow. A requestor 130 may further process the clips for presentation on the device 100, as desired.
This example may find application where extracted clips are to be concatenated into a larger video presentation and time-aligned with an audio presentation having different properties. Again, the audio presentation may have different temporal intervals of significance (example: verses that last for 45 seconds, choruses that last for 25 seconds, etc.) and different levels of activity associated with it (e.g., high tempo vs. low tempo). The requestor 130 may issue queries for desired content that identify desired motion flow and the durations of the audio intervals to which clips are to be aligned. When responsive clips are provided by the search system 120, the requestor 130 may compile a concatenated video by, for example, aligning high motion flow clips with portions of audio classified as high tempo and aligning low motion flow clips with portions of audio classified as low tempo.
In a further example, the system 100 may be applied in a device 100 (
In this example, search queries may identify desired clips by the scenes, actors and locations as represented in another data file. For example, a storyboard data file may identify a progression of scenes and actors that are to appear in a produced video. Queries may be received by the search system 120 that identify desired clips by scene and/or actor, which may be furnished in response. A requestor 130 may assemble an editable video from the clips so extracted that match the progression of scenes as represented in the storyboard file. The editable video may be presented to editing personnel for review and assembly.
The foregoing examples are just that, examples. In use, it is anticipated that far more complex queries may be presented to the system 100 that include any combination of metadata generated by the analytics unit 112 that indexes the videos in storage 114. Queries further may contain parameters that identify, for example, desired playback properties of video such as playback window size, orientation (e.g., landscape or portrait orientation), playback speed, and/or whether video is looped; compositional elements of desired video as scene type, camera motion type and magnitude, human action type, action magnitude, object motion pattern, the number of people or pets recognized in video, and/or the sizes of people or pets represented in video; and/or directed user interaction properties, such as videos tagged with specific person/pet identifiers, user-liked videos, user-edited video, user preferred styles, and the like. The multi-dimensional analytics unit 112 provides a wide array of search indicia that can be applied in search queries.
As discussed, the search system 120 (
In one embodiment, the search service 120 may provide all responsive clips in search results. In another embodiment, the search service 120 may provide a capped number of clips according to the clips' respective matching scores. In a further embodiment, the search service 120 may provide search results that summarize different scenes detected in responsive videos.
In a further embodiment, search results may include suggested playback properties that a requestor 130 may use when processing responsive clips. For example, search results may identify spatial sizes of detected people, animals or objects with clips, which may be used as cropping values (either a fixed crop window or a moving window) during clip processing. Alternatively, search results may include playback zoom factors, stabilization parameters, slow-motion ramping values and the like, which a requestor 130 may use when rendering clips or integrating them into other media presentations.
Search results further may identify content properties such as scene types, camera motion types, camera orientation, frame quality scores, people/animal identifiers, and the like, which a requestor may integrated into its processing decisions.
In another embodiment, the system 100 (
In an embodiment, when a new video is presented for importation, the method 300 may apply analytics to the new video (box 310) as discussed above. As discussed, the analytics may generate metadata results for the new video, from which the method 300 may build a search index (box 320) as the video is stored.
In an embodiment, when a query is presented, the method 300 may run a search on the index utilizing search parameters provided in the query (box 330). For responsive videos, the method 300 may determine range(s) within the video that correspond to the search parameters (box 340). The method 300 may build clips from the responsive videos based on the ranges (box 350) and furnish the clips to a requestor in a query response (box 360).
The device 400 may possess a transceiver system 430 to communicate with other system components, for example, requestors 130 (
The device also may include display(s) and/or speaker(s) 440, 450 to render video retrieved from storage 114 according to the techniques described in the examples hereinabove.
Although the system 100 (
Several embodiments of the disclosure are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the disclosure are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the disclosure. The present specification describes components and functions that may be implemented in particular embodiments, which may operate in accordance with one or more particular standards and protocols. However, the disclosure is not limited to such standards and protocols. Such standards periodically may be superseded by faster or more efficient equivalents having essentially the same functions. Accordingly, replacement standards and protocols having the same or similar functions are considered equivalents thereof.
The present disclosure benefits from priority of U.S. application Ser. No. 63/347,784, filed Jun. 1, 2022 and entitled “Video Classification and Search System to Support Customizable Video Highlights,” the disclosure of which is incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63347784 | Jun 2022 | US |