1. Field of the Invention
The present invention relates to a method and system for associating keywords with objects in a video file so that the keywords are searchable on the internet by existing search engines.
2. Description of the Related Art
It is desirable to be able to locate videos that a user wants to watch based on a key word search. It is also desirable to search for video clips that contain certain objects in the clip. Methods of searching for videos in a large database or on the internet using keywords are well known in the art.
U.S. Pat. No. 6,925,474 to McGrath discloses a system of searching a database for videos containing certain objects and attempting to isolate only the portions of the videos that contain that object. The method uses the audio track to search, e.g., by searching the closed caption text or by first converting the audio track to text using a voice recognition program, e.g., akin to DRAGON NATURALLY SPEAKING® or the like. McGrath does not track objects. McGrath also does not attach hyperlinks to objects within the video. Moreover, McGrath is searching existing words in the video, not keywords for objects. Further, the audio track text will not necessarily correspond to the portion of the video showing a desired object. The object may only be mentioned once, and not necessarily at a beginning and ending frame containing the object. In fact, the object may not even be mentioned at all in the audio text.
There have also been patents on tracking objects throughout video clips and hyperlinking them to more information about that object such as U.S. Pat. No. 6,642,940 to Dakss. However, no search method for such an object is disclosed. A complete system that enables an intranet or internet user to link to the object by keyword searching is needed.
In one embodiment, videos are first reviewed by frame. Objects that are desired to be made searchable are identified. Each such object is tracked, by an object tracking software application, through each frame throughout the length of the content. Optionally, hyperlinks may be applied to any number of objects as desired. Then, each object is described by an “object descriptor,” which is then used to generate keywords or keyword phrases concerning the object and the keywords or phrases or a link to the keywords or phrases, data concerning the location of the video, a web server and the object within the video, and any desired external link(s), are stored in an XML file or other file format capable of storing such data, along with the object descriptor. The keywords or keyword phrases are, if necessary, converted to and stored in a format which is searchable by existing search engines. (As used in this application, keywords includes or may include keyword phrases.)
A web page is created, potentially for each object and keyword (including keyword phrases), and are posted thereto along with the data in the XML file or a link to the XML file. In this way, a search engine's crawling function will find the web page and the keywords posted thereto. Preferably, the web page is a phantom page, i.e., the web page never appears to the internet user yet may be found by search. That is, the information returned in the search to the user provides, from the XML file, a link directly to the video server where the video is stored that corresponds to the keyword(s) searched by the user on a search engine. The information stored on the web page or associated therewith associates the keywords with that video and the XML file containing the object segment, i.e., start and stop frame of the video where the object appears, and any external (further) links from that object, and the location of the video server containing the video.
Preferably, there is one web page for each searchable object within a video, although multiple XML files for multiple objects and/or for multiple videos could be stored on a web page by segmenting the keywords for one video object segment from the keywords for another video object segment.
When a user searches the internet or a database by entering keywords, an object that appears in the videos associated with those keywords will have its link appear within the search results as a description of the link. The links shown are cued to the first frame that the object appears in and end at the last frame that the object appears in. The entire video is not and need not be downloaded or viewed but may if the user desires it.
In a more preferred embodiment, the user can click on the objects within the video and be hyperlinked to further information about the object, to a related web site, and/or to further video, or any linkable item.
In one preferred embodiment of the invention, the method includes identifying one or more objects in a video, and for each such object, tracking the object to determine the initial and final frames in which each such object appears, such as by time code data, associating such objects with object link descriptors, generating a set of keywords from the link descriptors, and storing this object data (video server location for the video, object first and last frame data for all appearances, object link descriptor, and keywords, or keyword link data) in a file, preferably an XML file. Optionally, links may be attached to the objects by e.g., also storing the linking data in the XML file in association with the object. Preferably, the keywords are posted to a web site in a language (e.g., HTML, CSS, etc.) which is searchable by a standard search engine. Such a web site may be a phantom web site, never seen by the user, or an existing or new site. A user searching on the search engine for words, phrases or terms that are within the set of keywords associated with the object will be able to link directly to the video, cued to the segment containing the object, by clicking on the search results link or links.
The invention is described with respect to the interne. However, the method may be applied to an intranet or database.
It should also be noted that in the method and system described herein, the word “object” in a video refers to a table, chair, person, article of clothing or other visible thing in a video frame, and is not the same as the word “object” as it is sometimes used in searchable database terminology referring to an item of data stored in the database, such as a document, or image, and one may enter many different fields to describe different aspects of the document or image, such as author name, subject matter, date of creation, a keyword, topic, etc., and then be able to enter a search criterion (or criteria) on such different aspects and pull up all the documents and/or images that correspond to the search criterion. In the case of such a database, sometimes the various fields or aspects contain entries created from a drop down menu, and other times such fields contain entries that are manually determined and entered by the user.
With reference to
There is a memory 14 which may be any type of memory capable of storing software applications and data of the type disclosed herein, and may be one memory or multiple memories. The software applications are preferably an embedder application 15, a keyword generator application 16, an XML application 16a, an interpreter application 17, and a web server application 18. The applications 15, 16, 16a, 17 and 18 generate data, as explained below, which may be stored in a data memory section 19. These various sections of memory 14 are for assisting in a conceptual understanding of this embodiment of the invention, and any memory architecture that supports such applications and data me be used. In addition, the applications are separately described, but could be provided in one application having multiple modules, or several individual applications, or other suitable arrangement.
The computer 2 provides an assembly system for identifying objects in videos and generating searchable keywords with links to the videos in which the objects appear, cued to the segments of the video in which they appear. Optionally, hyperlinks from the objects to further web pages, information, videos, or other linkable items may be added.
Assembly Phase
Steps 31 to 36 of
As a part of this step or as a preparatory step, a client or the assembler selects a video or multiple videos for processing objects therein to make such objects internet or intranet searchable. In the case of a clothing retailer, the retailer might want to make all shirts searchable, and so would select all videos containing shirts from its database, video server, or archives. Therefore, a video or videos, e.g., from a video server, are manually or automatically selected. An exemplary method of automatic selection could be selecting videos from all videos available on a designated server or servers, e.g., alphabetical order, sequential order or other selection method, e.g., all videos concerning “clothing,” or “shirts,” or all videos of a particular category, e.g., concerning “safety,” or other selection method.
The data for where the video or videos are located is stored, e.g., in the data memory 19.
After selection of the video(s), a first object is selected or identified in the video. At this time, a first object identifier may be stored in data memory 19 as a mechanism to uniquely identify the object and store the data in association therewith. Preferably, the data is stored first in connection with the video location data, which will be unique. Then, any particular object can be described uniquely by time code and object descriptor, described below, or by assigning a unique object identifier, such as simple numbering of the object. Time code or time codes as used in this application may be represented or replaced with frame number or any other description of a frame location within a video.
The object, once selected, may be tracked using object tracking software to determine where the object first appears in the video and where the object last appears and all other appearances The time codes corresponding to the first appearance and last appearance (object location) in the video are recorded and stored, e.g., also in the data memory 19 in association with the video location and the first object. This time code, or other data to identify the video segment in which the object appears may be achieved manually, but is preferably automated, e.g., by a process disclosed in U.S. patent application Ser. No. 10/443,301, filed May 21, 2003, and published as U.S. Published Patent Application No. 2004/0233233, both incorporated by reference herein. This particular step of tracking the object may occur now or at another time in the process, before moving on to the next object in the video that is to be identified and processed as described herein.
In step 32, which may also be performed by the embedder 15 or by the keyword generator 16, the identified object(s) are provided with an object link descriptor, which is preferably manually assigned and entered into the software. The link descriptor is a word or phrase describing the object, such as “blue shirt,” if the object is a blue shirt, or the particular brand model name of a car, if the object is a car. Additional descriptors, such as “car,” “sedan,” etc. may be manually added. The object descriptor is also preferably stored in the data memory in association with the object identifier.
At step 33, keywords are generated corresponding to each object descriptor, e.g., using the keyword generator 16 of
The keyword generator would take the input text from the object descriptor stored in data storage 19 of
The list could be further increased by substituting “azure,” “royal,” “navy” and/or other names for and/or shades of blue. If during use it were found that users searched for “rugby shirt” a lot, or a sponsor wanted to promote rugby shirts, then “blue rugby shirt” might be added manually or automatically to the list. The keywords or keyword phrases may also be stored in the data memory 19, in association with the Object Descriptor, or unique identifier for the object. A Link Descriptor, a link to the video and a link to the object in the Video by virtue of the time codes or equivalent means of cueing the video, as well as any links from the object, i.e., external links, are also stored in the data memory 19.
At step 34, the link descriptor, keywords or keyword phrases, object location (time code for beginning and ending frames for the object's appearance in the video), and location of the video containing the object, and any external link data, are processed by the XML application 16a by converting or simply storing this data, preferably in one XML file, or any other format capable of handling this data. At step 35, the interpreter application 17 converts the keywords including the link descriptor, in to a searchable file format such as HTML.
The keywords or keyword phrases, however, could have previously been generated in (as opposed to having to be converted into) a file format such as HTML or CSS (cascading style sheets), which is searchable by existing search engines. In any case, the XML file must be associated with the searchable keywords file.
At step 36, the interpreter application 17 or (phantom) web server application 18, contain well known software to post or attach the data, i.e., the XML file and the keyword file in searchable text, to a web page. The phantom web server 18 posts the web page on the interne (or an intranet). The web page may then be “crawled,” i.e., the existing search engines, e.g., YAHOO!®, GOOGLE® and others, may send out “spiders” which “crawl” through each web site including the home web page, and any secondary web pages, and any hyperlinks to other sites or information, or other hyperlinked items, for the search engine to be able to provide links for these pages or items in response to search requests by users of the search engine.
At this stage, the assembly is complete.
The assembly process is also schematically depicted in
The embedder 15 then outputs linking data L.D., shown in
Then, the file location data W.P. is posted on the web page denoted phantom web page 20 generated by the assembler's software at its phantom web site server 20, or an existing web page on the client's web server (client server 22), as desired.
Web Search Phase
In step 37, existing search engines crawl the web and therefore would find phantom web page 20 (or the existing client server web site 22) containing the keywords and XML file. The keywords and XML file (with the object descriptor, video location, object time codes, and any external links) are then available for searches conducted by internet users shown, e.g., as having PCs 26-28 (personal computers) in
At step 38, one or more internet users (PC 26-28) connect to the internet to a search engine page e.g., as shown by browser screen 61 of
In step 38, the search results are returned to the user's internet browser screen, e.g., as shown in
The screen 61a showing the search results 70 normally contains multiple results. In
As on the browser screen 61 of
In step 40, the user may link from the object in the video to further web pages, (e.g., describing rugby shirts), information such as purchase information (e.g., how to purchase a blue rugby shirt), and/or another video, (e.g., about rugby shirts or showing rugby shirts).
As shown in
Also, a new search may be conducted, using search terms inputted to box 86 by the user, and clicking the new search button 88. The new search, however, is preferably limited to the sponsor's or client's server, or the assembler's server. Another subscreen 90 may be provided having keyword(s) in a box 91, an image and/or text with a hyperlink in box 92, and a caption or text box 93. For example, if the object in the video 80 is a blue shirt, the keyword(s) 91 may be “clothing,” the image 92 may be a matching pair of slacks, and the text 93 may be “these casual slacks would go well with a blue rugby shirt,” or the like.
Although the invention has been described using specific terms, devices, and/or methods, such description is for illustrative purposes of the preferred embodiment(s) only. Changes may be made to the preferred embodiment(s) by those of ordinary skill in the art without departing from the scope of the present invention, which is set forth in the following claims. In addition, it should be understood that aspects of the preferred embodiment(s) generally may be interchanged in whole or in part.
Number | Name | Date | Kind |
---|---|---|---|
5517605 | Wolf | May 1996 | A |
5659742 | Beattle et al. | Aug 1997 | A |
5729741 | Liaguno et al. | Mar 1998 | A |
5751286 | Barber et al. | May 1998 | A |
5794249 | Orsolini et al. | Aug 1998 | A |
5819286 | Yang et al. | Oct 1998 | A |
5893110 | Weber et al. | Apr 1999 | A |
5987454 | Hobbs | Nov 1999 | A |
6067401 | Abecassis | May 2000 | A |
6070161 | Higashio | May 2000 | A |
6161108 | Ukigawa et al. | Dec 2000 | A |
6397181 | Li et al. | May 2002 | B1 |
6457018 | Rubin | Sep 2002 | B1 |
6493707 | Dey et al. | Dec 2002 | B1 |
6603921 | Kanevsky et al. | Aug 2003 | B1 |
6642940 | Dakss et al. | Nov 2003 | B1 |
6697796 | Kermani | Feb 2004 | B2 |
6741655 | Chang et al. | May 2004 | B1 |
6819797 | Smith et al. | Nov 2004 | B1 |
6859799 | Yuen | Feb 2005 | B1 |
6925474 | McGrath et al. | Aug 2005 | B2 |
6990448 | Charlesworth et al. | Jan 2006 | B2 |
7003156 | Yamamoto et al. | Feb 2006 | B1 |
7020192 | Yamaguchi et al. | Mar 2006 | B1 |
7024020 | Lee et al. | Apr 2006 | B2 |
7032182 | Prabhue et al. | Apr 2006 | B2 |
7054812 | Charlesworth et al. | May 2006 | B2 |
7240075 | Nemirofsky et al. | Jul 2007 | B1 |
20010018771 | Walker et al. | Aug 2001 | A1 |
20020069218 | Sull et al. | Jun 2002 | A1 |
20020087530 | Smith et al. | Jul 2002 | A1 |
20020163532 | Thomas et al. | Nov 2002 | A1 |
20030122860 | Ino | Jul 2003 | A1 |
20040215660 | Ikeda | Oct 2004 | A1 |
20050022107 | Dey et al. | Jan 2005 | A1 |
20050044056 | Ray et al. | Feb 2005 | A1 |
20050044105 | Terrell | Feb 2005 | A1 |
20050128318 | Leow et al. | Jun 2005 | A1 |
20050182759 | Yuen | Aug 2005 | A1 |
20050223031 | Zisserman et al. | Oct 2005 | A1 |
20050223034 | Kaneko et al. | Oct 2005 | A1 |
20050271304 | Retterath et al. | Dec 2005 | A1 |
20060082662 | Isaacson | Apr 2006 | A1 |
20060271594 | Haberman | Nov 2006 | A1 |
20070044010 | Sull et al. | Feb 2007 | A1 |
20070199031 | Nemirofsky et al. | Aug 2007 | A1 |