Internet searching has become increasingly common in recent years. Search engines conventionally receive a user keyword or other search query and return a search results page including links to identified search results. Initially, search result pages primarily included links to relevant text found on web pages. As audio and video clips have become more commonly included in web pages, search engines have correspondingly begun identifying relevant audio and video clips and including the identified clips in search result pages.
Audio and video clip search results may also include a clip preview that enables a user to quickly assess the relevance of the clip to the user's search query. Conventionally, however, a single preview is generated for each audio or video clip regardless of the user's search query. Two different queries seeking different information that both identify a particular clip as relevant will thus both include the same clip preview. Depending on the search query, this “one-size-fits-all” approach to audio and video clip previews may not provide a user with an informative clip preview.
Embodiments of the present invention relate to systems, methods, and computer media for providing query-dependent audio and video clip previews. Using the systems and methods described herein, an identification of an audio or video clip relevant to a user search query is received. The user search query has one or more keywords. Occurrences of the keywords and the locations of the occurrences are identified in a transcription of the identified audio or video clip. One or more clip segments are extracted from the audio or video clip. Each extracted clip segment includes an identified keyword occurrence. A query-dependent clip preview is created that includes at least one extracted clip segment including a keyword occurrence.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The present invention is described in detail below with reference to the attached drawing figures, wherein:
Embodiments of the present invention are described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” or “module” etc. might be used herein to connote different components of methods or systems employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
Embodiments of the present invention relate to systems, methods, and computer media for providing query-dependent audio and video clip previews. As discussed above, conventionally, only one clip preview for an audio or video clip is generated for use in search results, regardless of the keywords used to identify the clip as relevant. Thus, in a conventional system, if a first user searches for a first keyword and a video clip is returned in a first search results page, and a second user searches for a second keyword and the same video clip is returned in a second search results page, the accompanying video clip preview included in both search results pages is the same preview.
Clip previews generally include multiple portions of a clip stitched together. Even though an audio or video clip as a whole may be relevant to a user search query, the clip preview itself may not contain any portions of the clip that are relevant. Thus, depending on what portions of a clip are included in the corresponding preview, a clip preview may be informative for some search queries and not informative for others.
In accordance with embodiments of the present invention, a query-dependent clip preview can be provided along with search results for a user search query to give a user a clip preview that is specific to the user's query.
In one embodiment of the present invention, an identification of an audio or video clip relevant to a user search query is received. The user search query has one or more keywords. One or more keyword occurrences of at least one of the one or more keywords are identified in a transcription of the identified audio or video clip. The locations of the one or more keyword occurrences in the transcription are also identified. One or more clip segments are extracted from the audio or video clip. Each extracted clip segment includes an identified keyword occurrence. A query-dependent clip preview is created that includes at least one of the one or more extracted clip segments that each include an identified keyword occurrence.
In another embodiment, a clip identification component receives an identification of an audio or video clip relevant to a user search query. The user search query having one or more keywords. A transcription analysis component identifies (1) one or more keyword occurrences of at least one of the one or more keywords in a transcription of the identified audio or video clip and (2) the locations of the one or more keyword occurrences in the transcription. A clip segment extraction component extracts one or more clip segments from the audio or video clip. Each extracted clip segment includes an identified keyword occurrence. A preview generation component creates a query-dependent clip preview that includes at least one of the one or more extracted clip segments that each include an identified keyword occurrence.
In still another embodiment, an identification of an audio or video clip relevant to a user search query is received. The user search query has one or more keywords. One or more keyword occurrences of at least one of the one or more keywords are identified in a transcription of the identified audio or video clip. The locations of the one or more keyword occurrences in the transcription are also identified. A plurality of clip segments are extracted from the audio or video clip. Each extracted clip segment including an identified keyword occurrence. Using a processor of a computing device, a query-dependent clip preview is created that includes at least two of the plurality of extracted clip segments that each include an identified keyword occurrence. The location of one of the one or more identified keyword occurrences is selected as a linked clip start point. A user selection of a search result page link to the relevant audio or video clip causes the relevant audio or video clip to begin play at the linked clip start point. A search results page is provided for the user search query that includes both a link to the relevant audio or video clip and the query-dependent clip preview.
Having briefly described an overview of some embodiments of the present invention, an exemplary operating environment in which embodiments of the present invention may be implemented is described below in order to provide a general context for various aspects of the present invention. Referring initially to
Embodiments of the present invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. Embodiments of the present invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. Embodiments of the present invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With reference to
Computing device 100 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 100 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 100.
Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave. The term “modulated data signal” refers to a propagated signal that has one or more of its characteristics set or changed to encode information in the signal. By way of example, and not limitation, communication media includes wired media, such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, radio, microwave, spread-spectrum, and other wireless media. Combinations of the above are included within the scope of computer-readable media.
Memory 112 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, nonremovable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 100 includes one or more processors that read data from various entities such as memory 112 or I/O components 120. Presentation component(s) 116 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
I/O ports 118 allow computing device 100 to be logically coupled to other devices including I/O components 120, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
As discussed previously, embodiments of the present invention relate to systems, methods, and computer media for providing query-dependent audio or video clip previews. Embodiments of the present invention will be discussed with reference to
In response to receiving user search query 202, search system 204 identifies relevant web pages and/or resources, including relevant audio and/or video clips. Clip identification component 206 receives an identification of an audio or video clip relevant to user search query 202. Clip identification component 206 communicates the identification to a transcription analysis component 208. Transcription analysis component 208 analyzes a transcript of the identified audio or video clip relevant to user search query 202 by searching for keyword occurrences of the keywords comprising search query 202. Transcription analysis component 208 identifies both keyword occurrences of the keywords comprising search query 202 as well as the locations of the keyword occurrences in the transcription. The words “transcript” and “transcription” are used interchangeably in this document.
In some embodiments, transcription of an audio or video clip is performed by search system 204. In such embodiments, audio and video clips are identified by search system 204 during the crawling process. Through various processes known in the art, for example, by analyzing the URL pattern, domain, title of the clip or web page, or other features, it can be determined if an audio or video clip is speech-based. Speech recognition technology is applied to speech-based clips to generate a transcript. Each word recognized above a predetermined confidence threshold is tagged with a timestamp corresponding to the location (time) the word was said in the clip. A reverse index is then generated such that a first column includes recognized words in the clip, and a second column includes a list of timestamps for each word where the word appears in the clip. The reverse index is stored with the audio or video clip or data about the audio or video clip in the search engine's index. In other embodiments, transcription analysis component 208 may perform transcription after identification of a relevant clip is received. In still other embodiments, transcription of identified clips may be performed dynamically by search system 204.
Identified keyword occurrences and the locations at which the occurrences are found in the clip are provided to a clip segment extraction component 210. Clip segment extraction component 210 extracts one or more clip segments from the audio or video clip. Each extracted clip segment includes an identified keyword occurrence. Clip segments can be determined in a variety of ways. In one embodiment, clip segments are a pre-determined length—for example, 10 seconds. In other embodiments, clip segments are long enough to include desirable information. For example, if 12 occurrences of a keyword are present in a 6-second span and no or few occurrences are present in the 10 seconds on either side of the 6-second span, a clip segment may be identified and extracted that includes the entire 6-second span and as little additional time on either side of the span as is possible to make a clean segment. In one embodiment, segments begin and end during a brief silence or pause in speech so as to provide a natural transition.
Each clip segment extracted from the relevant audio or video clip by clip segment extraction component 210 includes at least one keyword occurrence of at least one keyword. Clip segment extraction component 210 provides one or more extracted clip segments to preview generation component 212. Preview generation component 212 creates a query-dependent clip preview that includes at least one of the one or more extracted clip segments that each include an identified keyword occurrence. Thus, preview generation component 212 creates a clip preview customized for received user search query 202 by creating the preview from one or more clip segments that each mention at least one keyword included in query 202. The query-dependent clip preview created by preview generation component 212 is much more likely to provide useful information to a user than a standard, query-independent clip preview that may not be relevant to any of the user's search terms.
For example, a news clip may contain information related to multiple stories. There may be a headline story and a few minor stories. If the news clip is identified as relevant to a user's search for a minor story, a standard clip preview that is created may focus on the headline story and may not include information about the minor story. Such a preview is not helpful to the user. By implementing the components of system 200, a query-dependent clip preview is generated by combining clip segments that are relevant to the user's particular query—in this case, the query-dependent clip preview would include clip segments that discuss the minor news story.
In some instances, a query keyword may only appear once in a clip. In such cases, the clip preview may be selected as a 30-second or other pre-determined time window centered around or otherwise including the one occurrence. In other instances, one or more query keywords may appear a large number of times. In such cases, occurrences may be ranked, and the clip segments extracted by clip segment extraction component 210 include occurrences that are highly ranked. The query-dependent clip preview created by preview generation component 212 may include a predetermined number of the extracted clip segments that include highly ranked keyword occurrences. The ranking could be based on, for example, the number of other occurrences near the occurrence or the proximity of an occurrence of each query term.
In some embodiments, time segments prior to extraction or extracted clip segments may be ranked in addition to or instead of occurrences. For example, based on clustering of keyword occurrences or the number of keyword occurrences of various keywords in a particular time period, a time segment may be identified. Additional time segments may then be identified. The time segments may then be ranked, for example by the number of keyword occurrences in the time segment or by the ranking of the keyword occurrences in the time segment, and the highest-ranked segments may be extracted as clip segments.
Occurrences may also be ranked more highly if they are part of a key phrase. A key phrase is a group of words that often appear together. Key phrases may be identified using a pre-determined list or by analyzing transcribed audio and video clips or web pages and determining groups of words that frequently appear together. Thus, if a user has searched for two words that comprise a key phrase, occurrences of both words together may be more relevant and informative that occurrences of either word individually. In one embodiment, user search query 202 includes a plurality of keywords, and when the plurality of keywords includes a key phrase, clip segment extraction component 210 ranks occurrences of the key phrase in the transcription higher than individual occurrences of the plurality of keywords.
In some embodiments, clip segment extraction component 210 extracts a plurality of clip segments from an audio or video clip, and the query-dependent clip preview created by preview generation component 212 includes at least two of the extracted clip segments. In other embodiments, the keyword occurrences identified by transcription analysis component 208 are ranked, and the query-dependent clip preview created by preview generation component 212 includes three extracted clip segments that include highly ranked keyword occurrences, with each extracted clip segment included in the query-dependent clip preview being approximately 10 seconds long.
In still other embodiments, the length of the query-dependent clip preview is proportional to the length of the audio or video clip. For example, if a video clip is an hour long, preview generation component 212 may create a longer preview than if the clip were five minutes long. The proportionality can be according to a predetermined ratio or can be a rough proportionality involving general classifications of “long,” “standard,” and “short,” for example.
Search results page 214 may be generated by search system 204 and includes a link to the relevant audio or video clip along with the query-dependent clip summary.
The functionality of the various components of system 200 may be embodied on one or many physical devices, and various embodiments may not include all components shown in
In some embodiments, when a user is navigated to the web page where the full video is hosted, the video begins play or is positioned to begin play at the location of an identified keyword occurrence. In this way, a user is directed straight to a relevant portion of the clip. In such embodiments, the occurrence is identified as a linked clip start point. In other embodiments, when the full clip is viewed, the locations of additional keyword occurrences in the audio or video clip are displayed with the audio or video clip. For example, markers may be included in a video or audio player indicating additional keyword occurrences. A user selection of one of the displayed additional keyword occurrence locations causes the audio or video clip to play the portion of the clip corresponding to the location of the keyword occurrence—that is, the clip will skip forward or backward to the location of the selected additional occurrence.
In some embodiments, the location of one of the one or more keyword occurrences is selected as a linked clip start point in step 412. A user selection of the link to the relevant audio or video clip causes the relevant audio or video clip to begin play at the linked clip start point. In some embodiments, step 414 is performed. In step 414, a search results page for the user search query is provided that includes both a link to the relevant audio or video clip and the query-dependent clip preview.
In other embodiments, the keyword occurrences identified in step 406 are ranked, and the query-dependent clip preview created in step 410 includes a predetermined number of the extracted clip segments that include at least one highly ranked keyword occurrence. In one particular embodiment, the predetermined number of extracted clip segments is three, each extracted clip segment included in the created query-dependent clip preview is approximately 10 seconds long. In another embodiment, the query-dependent clip preview is approximately 30 seconds long. In still another embodiment, upon determining that the plurality of keywords includes a key phrase, occurrences of the key phrase in the transcription are ranked higher than individual occurrences of the plurality of keywords.
In one embodiment, the plurality of extracted clip segments are ranked, rather than or in addition to ranking the occurrences. In such an embodiment, the query-dependent clip preview includes a predetermined number of the extracted clip segments ranked the highest.
Various embodiments may not include all steps shown in
The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and inherent to the system and method. It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations. This is contemplated by and is within the scope of the claims.