Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet

Abstract
The present invention provides an apparatus and method for extracting the content of a video, image, and/or audio file or podcast, analyzing the content, and then providing a targeted advertisement, search capability and/or other functionality based on the content of the file or podcast.
Description

BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram of the basic hardware system used in the preferred embodiment.



FIG. 2 is a flowchart of the basic method used in one embodiment for extracting text from and creating descriptive data for audio data.



FIG. 3 is a flowchart of the basic method used in another embodiment for creating descriptive data for image data or video data.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Embodiments implementing the present invention are described with reference to FIGS. 1-3. FIG. 1 shows the basic components of the hardware of one embodiment. Typically, a user will operate a computing device 10 to connect to a network 12, such as the Internet. Computing device 10 can be any device with a processor and memory, and includes PCs, laptops, mobile phones, PDAs, servers, etc. Computing device 10 preferably includes a display device and a media player. The connection to network 12 can be through any type of network connection, cellular network, mobile phone network, etc. The network 12 will connect a plurality of users and a plurality of servers and communicate data/content between the servers and the users. In one embodiment, Server A 14 will provide video, image, and/or audio content over the network 12, such as through a podcast or download to other computing devices connected to the network. Server B 16 will be able to access that content through the network 12. Server B 16 can include (or can be coupled to other devices containing) a database 18, storage device 20, search engine 22, and advertising engine 24. The database 18 typically comprises a database software program running on a server or other computer. The storage device 20 typically comprises magnetic or optical storage devices such as hard disk drives, RAID devices, DVD drives, or other storage devices. The storage device 20 typically stores the software run by the server and other associated computers as well as the underlying data and database structures for database 18. The search engine 22 typically comprises a software program running on a server or other computer that is capable of identifying relevant data records in database 18 based on a search request entered on computing device 10 by a user. The advertising engine 24 typically comprises a software program running on a server or other computer that is capable of identifying advertising data that is relevant to the search request entered on computing device 10. Database 18, storage device 20, search engine 22, and advertising engine 24 are well-known in the art and may all be contained on a single server (such as Server B 16) or on multiple servers.



FIG. 2 illustrates an embodiment relating to audio content or to the audio portion of a file or podcast that includes both video and audio. The method illustrated in FIG. 2 is preferably implemented on a server or other computing device. Server B 16 will first download the audio data offered by Server A 14 over the network 12 (step 30). Server B 16 will then automatically process the data, including the step of performing speech-to-text conversion on that audio data and/or creating descriptive data. (step 32). Speech-to-text conversion is well-known in the art. Creating descriptive data involves processing the text data to determine descriptive data that falls within certain predetermined database fields (e.g., a field indicating the general realm of the audio content, such as stock market information or movie news). Such processing essentially creates metadata that describes the content of the audio podcast. For instance, the database could include a field called “genre” that describes the general realm of the content. The entry that is placed into that field would be based on the content itself. As an example, if the extracted textual data includes the words “foreign policy” and “President,” then an entry of “politics” could be placed in the genre field. That metadata would then be associated with that particular audio content. In this manner, audio content can be indexed (and later searched). The text, the descriptive data, and/or the audio data are imported into a database. (step 34).


Referring still to FIG. 2, a user will then input a search request (e.g., “lawnmowers”) on computing device 10, such as through an Internet search engine run by Server B 16. That request will be received by Server B 16 over network 12 (step 36). Server B 16 and/or search engine 22 will then execute the search within the database 18 that includes the extracted textual data and/or descriptive metadata that previously was generated for the audio data (step 38). If the search implicates the extracted textual data or descriptive metadata, then server B 16 and/or advertising engine 24 optionally: (i) will identify a relevant advertisement based on the descriptive metadata, and that advertisement will be sent to computing device 10 for display (step 40), and/or (ii) will provide the audio data (which it previously obtained from server A 14 and stored) or a link to the audio data stored on server A 14 to the user (step 42). Server B 16 and/or advertising engine 24 optionally can format the advertisement to fit the display and graphics parameters of the display device of computing device 10 prior to transmitting the advertisement to computing device 10.



FIG. 3 illustrates an embodiment that relates to images, video content, or to the video portion of a file or podcast that includes both video and audio. The method illustrated in FIG. 3 is preferably implemented on a server or other computing device. Server B 16 will first download the image data or video data offered by Server A 14 over the network 12 (step 50). Server B 16 will then automatically process the image data or video data, including the step of performing image recognition on that image or video data. (step 52). Image recognition involves comparing one or more frames of the video data to a set of previously stored, known images, such as images of famous politicians, pop icons, etc. Image recognition is well-known in the art. The step of image recognition will generate recognition data (e.g., the name of a famous politician that shows up in Frame X of the video data) (step 52). Server B will then import the image data, video data and/or the recognition data into database 18. The recognition data can be further processed and the resulting descriptive data and/or the recognition data itself stored in certain database fields (e.g., a field indicating the names of persons who appear in the video) (step 54). Steps 52 and 54 essentially create metadata that describes the content of the image or video data. For instance, the database could include a field called “genre” that describes the general realm of the content. The entry that is placed into that field would be based on the recognition data. As an example, if the recognition data includes “Abraham Lincoln” (because the prior step of image recognition had created that data based on an image in the video data) then an entry of “politics” could be placed in the genre field. The underlying image or video content will then be associated with the recognition data (“Abraham Lincoln”) generated as a result of the image recognition step as well as descriptive data (“politics”) generated through processing the recognition data. In this manner, video content can be indexed.


Referring again to FIG. 3, a user will then input a search request on computing device 10. That request will be received by Server B 16 over network 12 (step 56). Server B 16 and/or search engine 22 will then execute the search within the database 18 that includes the recognition data and/or descriptive data that previously was created for the video data (step 58). If the search implicates the recognition data and/or descriptive data, then server B 16 and/or advertising engine 24 optionally: (i) will generate an advertisement based on the recognition data and/or descriptive data, and that advertisement will be sent to computing device 10 for display (step 60), and/or (ii) will provide the image or video data (which it previously obtained from server A 14 and stored) or a link to the image or video data stored on server A 14 to the user (step 62).


With both audio and video downloads and podcasts, the timing of the advertisements can be synchronized with the audio and video content after the text data, descriptive data and/or recognition data has been created as discussed above. For example, if it has been determined that a certain video podcast contains a news segment on lawnmowers, an advertisement on lawnmowers can be integrated into the podcast to appear at the very moment when the news segment on lawnmowers begins, or even when the word “lawnmower” is spoken. Thus, after the user downloads the podcast and watches the news segment, the advertisement will appear on his or her screen at precisely the right moment. This is yet another benefit of converting audio, image, and video content into a text form that can be indexed, searched, and analyzed.


While the foregoing has been with reference to particular embodiments of the invention, it will be appreciated by those skilled in the art that changes in these embodiments may be made without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims.

Claims
  • 1. A method of converting audio data into a searchable form, comprising the steps of: receiving audio content;performing speech-to-text conversion on said audio content;importing the text created by said conversion into a database;receiving a search request from a user over a network; andexecuting the search request, wherein the search is performed in a database that includes said text or portions thereof.
  • 2. The method of claim 1, wherein the step of receiving audio content comprises the step of downloading said content over a network.
  • 3. The method of claim 2, wherein the method further comprises the step of providing said content or a link to said content to said user over said network.
  • 4. The method of claim 1, wherein the step of receiving audio content comprises receiving said content over a network through a podcast.
  • 5. The method of claim 4, wherein the method further comprises the step of providing said content or a link to said content to said user over said network.
  • 6. A method of converting audio data into a searchable form, comprising the steps of: receiving audio content;performing speech-to-text conversion on said audio content;processing the text created by said conversion to create descriptive data that describes the content of at least some of said text;importing one or both of at least some of said text and at least some of said descriptive data into a database;receiving a search request from a user over a network; andexecuting the search request, wherein the search is performed in a database that includes one or both of at least some of said text and at least some of said descriptive data.
  • 7. The method of claim 6 wherein the method further comprises the step of: generating advertisements based upon the search request.
  • 8. The method of claim 7 wherein the method further comprises the step of: transmitting the advertisements to said user over said network.
  • 9. A method of converting image or video data into a searchable form, comprising the steps of: receiving image content or video content;performing image recognition on said image or video content;importing the result of the image recognition step into a database;receiving a search request from a user over a network; andexecuting the search request, wherein the search is performed in a database that includes said result of the image recognition step.
  • 10. The method of claim 9, wherein the step of receiving image content or video content comprises the step of downloading said content over a network.
  • 11. The method of claim 10, wherein the method further comprises the step of providing said content or a link to said content to said user over said network.
  • 12. The method of claim 9, wherein the step of receiving image content or video content comprises receiving said content over a network through a podcast.
  • 13. The method of claim 12, wherein the method further comprises the step of providing said content or a link to said content to said user over said network.
  • 14. A method of converting image or video data into a searchable form, comprising the steps of: receiving image content or video content;performing image recognition on said image or video content;processing the result of the image recognition step to create descriptive data that describes the content of said result;importing said result and at least some of said descriptive data into a database;receiving a search request from a user over a network; andexecuting the search request, wherein the search is performed in a database that includes one or both of said result and at least some of said descriptive data.
  • 15. The method of claim 14 wherein the method further comprises the step of: generating advertisements based upon the search request.
  • 16. The method of claim 15 wherein the method further comprises the step of: transmitting the advertisements to said user over said network.
  • 17. A system for converting audio data into a searchable form, comprising: a first computing device for receiving audio content over a network and performing speech-to-text conversion on said audio content;a database for storing the resulting text; anda second computing device for receiving a search request from a user over the network wherein the second computing device is capable of executing the search in a database that includes said resulting text or portions thereof.
  • 18. The system of claim 17 wherein the first computing device and the second computing device are the same device.
  • 19. The system of claim 17 wherein the system further comprises: a third computing device for issuing the search request over said network.
  • 20. A system for converting image or video data into a searchable form, comprising: a first computing device for receiving image or video content over a network and performing image recognition on said image or video content;a database for storing the result of the image recognition; anda second computing device for receiving a search request from a user over the network wherein the second computing device is capable of executing the search in a database that includes said result.
  • 21. The system of claim 20 wherein the first computing device and the second computing device are the same device.
  • 22. The system of claim 20 wherein the system further comprises: a third computing device for issuing the search request over said network.
  • 23. A computing system, comprising: means for receiving audio content;means for performing speech-to-text conversion on said audio content;means for importing the text created by said conversion into a database;means for receiving a search request from a user over a network; andmeans for executing the search request, wherein the search is performed in a database that includes said text or portions thereof.
  • 24. A computing system, comprising: means for receiving image content or video content;means for performing image recognition on said image or video content;means for importing the result of the image recognition step into a database;means for receiving a search request from a user over a network; andmeans for executing the search request, wherein the search is performed in a database that includes said result of the image recognition step.
  • 25. A computing system for executing a set of instructions, wherein the instructions comprise: instructions for receiving audio content;instructions for performing speech-to-text conversion on said audio content;instructions for importing the text created by said conversion into a database;instructions for receiving a search request from a user over a network; andinstructions for executing the search request, wherein the search is performed in a database that includes said text or portions thereof.
  • 26. A computing system for executing a set of instructions, wherein the instructions comprise: instructions for receiving image content or video content;instructions for performing image recognition on said image or video content;instructions for importing the result of the image recognition step into a database;instructions for receiving a search request from a user over a network; andinstructions for executing the search request, wherein the search is performed in a database that includes said result of the image recognition step.