1. Field of the Invention
The present invention relates to television, and more particularly, to a system and method for the manual and/or automatic generation of interactive content related to television programming and advertisements.
2. Related Art
Interactive television (TV) has already been deployed in various forms. The electronic program guide (EPG) is one example, where the TV viewer is able to use the remote control to control the display of programming information such as TV show start times and duration, as well as brief synopses of TV shows. The viewer can navigate around the EPG, sorting the listings, or selecting a specific show or genre of shows to watch or tune to at a later time. Another example is the WebTV interactive system produced by Microsoft, wherein web links, information about the show or story, shopping links, and so on are transmitted to the customer premise equipment (CPE) through the vertical blanking interval (VBI) of the TV signal. Other examples of interactive TV include television delivered via the Internet Protocol (IP) to a personal computer (PC), where true interactivity can be provided, but typically only a subset of full interactivity is implemented. For the purposes of this patent application, full interactivity is defined as fully customizable screens and options that are integrated with the original television display, with interactive content being updated on the fly based on viewer preferences, demographics, other similar viewer's interactions, and the programming content being viewed. The user interface for such a fully interactive system should also be completely flexible and customizable.
No current interactive TV system intended for display on present-day analog or digital televisions provides this type of fully interactive and customizable interface and interactive content. The viewer is presented with either a PC screen that is displayed using the TV as a monitor, or the interactive content on the television screen is identical for all viewers. It is therefore desirable to have a fully interactive system for current and future television broadcasting where viewers can interact with the programming in a natural manner and the interactive content is customized to the viewer's preferences and past history of interests, as well as to the interests of other, similar viewers.
A key problem limiting the ability to deliver such fully interactive content coupled to today's analog or digital TV programming is the lack of a system for quickly generating this fully interactive content, either off-line or in real or near-real time. Currently, authoring tools are used for generation of the content with no input from the TV viewer, either off line or in real time. A system that generates fully interactive and dynamically defined content that is personalized for each viewer, using a combination of authoring tools, automatic generation based on programming material, and feedback from viewers themselves, is described in this patent.
The present invention is directed to a method and system for generating interactive content for interactive TV that is customizable and dynamically altered in response to the TV programming and advertising, the viewer's preferences, viewer usage history, and other viewer inputs. In order to automatically generate this interactive content, a system for processing a variety of data related to the TV programming is described, with examples being existing data sent in the vertical blanking interval (including closed caption text and current interactive data packets), web sites related to the TV program or advertisements, inputs from the viewers (including remote control selections, speech, and eye movements), text in the TV screen image such as banners, titles, and information sent over similar channels (such as other news channels if a news channel is currently being watched). The interactive content generation system may be located at a central site, or at the customer premise, or both.
In one aspect of the present invention there is provided a system for capturing and processing the closed caption text data that is frequently transmitted along with television broadcasts. The entire closed caption text is processed to identify keywords that can be used by later algorithms for identifying and re-purposing data available from packet switched networks for interactive television applications. The processing algorithms include using word frequency of occurrence lists with associated dynamic occurrence thresholds to filter out the least important and most commonly occurring words from the closed caption text, using grammatical rules and structure to identify candidate key words, using manual generation of key words related to the genre of the TV program being watched and selecting those closed caption keywords which are conceptually similar or lexigraphically related to the manually generated words, or any combination of these aforementioned algorithms. The resulting keywords are combined with keywords that indicate a particular viewer's preference or profile, and the combination keywords are used to generate interactive content related to what is happening in the television program at that moment during the program by searching data available from packet-switched networks or contained on a local network. If closed caption text is unavailable in a particular program, a speech recognition system is used to generate text from the audio portion of the television broadcast.
In another aspect, there is provided a method where web sites related to the television program are searched and processed in order to generate additional interactive content for interactive TV. Key words relating to the program known ahead of time, as well as key words provided by the closed caption text, or from the viewer himself when interacting with the system are used to process candidate web sites for useful links that can be integrated into the television programming.
In another aspect, there is provided a method using image capture and optical character recognition to recognize additional text which is displayed on the screen, process that text and generate additional interactive content for the viewer. This system is also used with pattern recognition to identify objects in the television image that may become subjects of interactive applications.
In another aspect, there is provided a method using MPEG 4 and/or MPEG 7 encoding of the television broadcast in order to highlight and recognize objects in the TV image using the arbitrary shape compression feature of MPEG 4, for example. Other embodiments may use wavelet techniques, or other edge detection schemes to highlight and identify objects in a television image.
In another aspect, the interactive content is generated and customized for each viewer using the results of the aforementioned aspects, combined with viewer inputs, demographic data, viewer preferences, viewer profiles which contain keywords to be combined with the keywords determined from processing the television program itself, inputs and preferences of other viewers, advertiser goals and/or inputs, and similar data derived from other television programs or channels which relates to the currently viewed program via lexigraphy, related terms, definitions, concepts, personal interest areas, and other relationships. Importantly, either the existing two way communications channel to the customer premises, or a separate two way communications channel to the interactive television integration device, may be used for sending data to, and receiving it from, the television viewer. Two techniques for customizing the interactive content are described. The first uses computer processing of the data from the aforementioned aspects and combination with designed goals and algorithms for provision of interactive TV. This technique is used for generation of customized interactive content that is specific to individual viewers, as well as content that is common to all viewers. The second technique requires a human being to review the data produced from the aforementioned aspects and the human selects the most desirable links and interactive content to embed into the television broadcast. The human-based system generates interactive content that is common to all viewers, or at least to large groups of viewers, and also generates interactive content that is driven by advertiser or other sponsor goals.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
The present invention will be described with reference to the accompanying drawings. The drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.
a depicts algorithms used for the generation of interactive television content when there is no access to a stored copy of the television program prior to its broadcast, and
The interactive content generator uses information contained in the television program, information previously stored in the interactive content libraries, and information from other content providers 108 to develop and synchronize candidate interactive television content to the television program. If the interactive content must be purchased by the viewer, and/or if the interactive content contains opportunities for purchases based on the content, then the transaction management server 109 coordinates the billing and purchases of viewers, and also provides other customer fulfillment functions such as providing coupons, special discounts and promotions to viewers. During actual broadcast or playing of the interactive television program, the interactive content selector 110 uses information from other content providers such as interactive television program sponsors, and viewer preferences, history, and group viewer preferences to select the specific interactive content which is to be associated with the television program. This interactive content can be customized for each viewer based on his or her preferences, selections during the program, or demographics. The interactive content chosen by the content selector is transmitted to the individual viewers via the packet switched network 114 and the customers' choices, preferences, and purchase particulars are also retained in the transaction management server and may be transmitted in part or in whole to interactive content providers 108 for the purpose of customer preference tracking, rewards, and customer fulfillment functions.
At the customer premise, the video reception equipment 116a receives the conventional television program, while the Internet equipment 118a receives the interactive content designed for the television program and customized for each individual viewer. The conventional video and interactive content are then integrated by the interactive TV integrator 120a for display on the customer's TV 122a and for interaction with the customer's interactive TV remote control 124. The interactive TV network simultaneously connects thusly to a plentitude of customer premises from one to n, as indicated by the customer premise equipment 116n through 124n. Thus, the interactive network shown in
FIG.3 shows a block diagram of the image content generation subsystem 202. The input baseband video is sent to a hybrid partial MPEG4/MPEG7 encoder 302 that is used to separate the input video into objects such as background and sprites (moving objects) within that background. Unlike MPEG2 encoding, MPEG4 performs its compression based on arbitrary shapes that represent individual objects in the image. Present-day MPEG4 encoders merely isolate the objects for individual encoding. But this capability is inherently suited to the automatic isolation, recognition, and classification of objects in the image for the purposes of interactive television applications. Going beyond the mere isolation of objects, the system of the present invention accepts the isolated object shapes output by the hybrid MPEG 4/7 encoder 302 and processes the objects in a shape movement generator 304 and a shape outline generator 308. The shape movements are determined via analysis of the motion compensation and prediction elements of the encoder such as B and P frames, and this analysis is performed in the movement recognition block 306. Likewise, the actual objects in the image such as coffee cups or cars are recognized in the shape recognition block 310.
To supplement the image object and movement recognition, an additional set of processing blocks are provided which use conventional image recognition techniques from digitally captured images. The baseband video is also sent to a periodic image capture system 312, after which image pattern recognition in performed in block 314 using algorithms specific to image object pattern recognition. The captured image is also sent to a movement/action pattern recognition block 316 where actions such as drinking, running, driving, exercising, and so on are recognized.
Since the television image often also contains text characters such as news banners which flow across the bottom of the screen, news titles and summaries, signs and labels, corporate logos, and other text, the image capture system also outputs its frames to an optical character recognition system 318 which recognizes the characters, parses them, and provides them to the text and sound interactive generation system 208 as shown in
Several algorithms can be used for the detection and recognition processing performed in blocks 306, 310, 314, 316, and 318, and for the correlation and correction of objects in each stream and from one stream generation system to the other performed in block 320. Conventional pattern recognition methods can be used for initial image classification, for example: neural network systems using the least means squared method, interval arithmetic method, or feed-forward method; fuzzy logic networks; statistical decision theory methods; successive iterative calculation methods; linear discriminant analysis methods; flexible discriminant methods; tree-structured methods; Baysian belief networks; deterministic methods such as wavelet transform method and other methods that are scale invariant. For correlating and correction detections across the image and audio systems, contextual methods such as object-based representations of context and a rule based expert system can be applied, where the rules of human behavior with respect to typical purchasable objects is one example of a rule set to be used, with statistical object detection being another method using joint probability distributions of objects within a scene. Graph methods can also be used.
The baseband audio is also input to the system to a sampler 406 and the samples sent to a speech recognition block 408 and a music and other sound recognition block 410. The speech recognition block 408 permits speech in the television to be detected and packetized in case the closed captioning data is absent or errored. The music and sound recognition block 410 recognizes and classifies the presence of music and other sounds that are not speech in the television program that can be used for interactive television purposes. For example, if music is detected, the interactive system can provide music ordering options to the viewer. For the centralized implementation of the interactive television content generator, the music artist and title can be detected as well. On the other hand, if certain sounds are detected such as explosions or gun shots, the viewer can be provided with options for action/adventure games, or options to suppress violent portions of the television program.
The audio information detected from the television program is combined with Optical Character Recognition (OCR) text from the image processing block 202 and all sound related interactive information is correlated and corrected in block 412. The words, sounds, and music detected in the television program are then parsed and encoded in block 414 for interactive stream output and for providing feedback to the image stream generation block 202.
The method by which interactive content stored in 112 is ranked and selected for viewers is shown in block 110. Individual viewer preferences and past history of interactions are stored in block 510 for purposes such as just described in order to select the optimum advertising content for viewers. These preferences and history data are derived from the interactive television integrator 120 in
Commonly desired actions 514 are also used for ranking and selection of interactive television content, such as ‘more info,’ ‘shop,’ ‘surf,’ ‘chat,’, and other actions by viewers when experiencing interactive television. Just as the viewer preferences and history are used to rank interactive content for display to viewers, when multiple choices exist for interactive content, the content associated with the most frequent viewer actions such as shopping can be ranked more highly and presented first to viewers. And of course advertiser and/or product vendor goals 516 are also used in order to rank and select interactive content to be presented or made available to viewers.
The interactive content ranking processor 518 is the method by which the plentitude of candidate interactive content is ranked and selected for transmission to the user. As with many current systems, an individual viewer can request content, and that request goes into the viewer's preferences and history block 510, with an immediate status such that the content is pulled from the library 112 and made available to the viewer. But unlike present interactive systems, the content and ranking processor 518 also provides a predictive capability, as previously described for the viewer who had no preference or history for a particular content, but nonetheless had an association with that content via a viewer group. Thus the interactive content ranking processor 518 provides the capability for interactive television viewers to receive both fully individualized content, as well as content that more general, but that is still highly relevant to the individual. As an example of the ranking processor, the viewer profile can be represented as a list of keywords indicating interests of that viewer. These keywords can be selected from a larger list by the viewer himself, or determined by monitoring viewing behaviors of the viewer. As the viewer navigates through the interactive content, he will be choosing content related to specific keywords in his profile; the more often a particular profile keyword is used, the higher ranking that is given to subsequent interactive content that is related to, or derived from that profile keyword. The highest ranking content can be presented as the default interactive content for a viewer to streamline the presentation of interactive content if desired.
The interactive content ranked and selected by the ranking processor is then distributed to viewers via the real time interactive content metadata generator 520. This generator uses the content ranking and selections of the ranking processor and the interactive content itself stored in the library 112 to package the content for delivery to viewers via their interactive TV integrator 120.
The RF video and audio are converted to baseband by the first tuner 602 and the second tuner 604 for passing to the switch 606. Alternately, the baseband video and audio may be input to the system directly and fed to the switch 606. Next time tags are generated from the video and audio by a time tag generator 608. The time tags are input along with the video and audio to a digital video recorder 610 for recording the television program along with time tags. The recorded digital video is provided to the interactive content generator 612, the content selector 614, and the interactive content integrator 622. The content generator works similarly to block 106 of
The viewer controls the interactive television integrator via the electronic receiver 618, which may use RF, IR, WiFi, or any combination thereof for signaling between the remote control and the interactive television integrator. The interactive television integrator can then process viewer inputs and transmit them back to centrally located transaction management servers, interactive content selectors, and/or other content providers. This two way interactive communication channel can be used for viewer commands, voice or video telecommunications or conferencing, or for setting up viewer preferences and profiles.
The processed viewer commands are then sent to a user interface block 620 which controls the digital video recorder, the interactive content selector, and an interactive content integrator 622. The content integrator is where packet based interactive content generated locally or remotely and selected by the content selector is merged with the television programming and presented to the viewer either via baseband video and audio output, or via video and audio wireless IP streaming to a remote control, or both.
a and 7b depict algorithms used for the generation of interactive content. These algorithms can be employed either in the centralized content generator, the local generator, or both.
b depicts the algorithms used for generation of interactive television content when the television program is available, either prior to broadcast, or during broadcast. Following selection of the television program 710, the previously developed interactive content is accessed 712 from the interactive television libraries 112. Next the synchronized interactive content is generated 714 by the interactive television content generator 106. This content and associations such as links and tags are updated and modified 716 based on new information on viewer preferences, history, advertiser goals, and so on. Finally, the updated interactive content and associations are output 718.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application Ser. No. 60/526,257 for “System and Method for Generation of Interactive TV Content,” which was filed Dec. 2, 2003, and which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60526257 | Dec 2003 | US |