The present invention relates generally to a system and method for providing interactive overlays for video presented on touch-screen devices. More particularly, the invention relates to a system and method for providing in a multimedia container video with metadata to signal supported interactions to take place in an overlay layer.
Not Applicable
Not Applicable
When children watch videos on a touch screen device, their instincts are to touch the screen while the video is being played and they are disappointed when nothing happens when they do. Examples of such touch screen devices are a tablet computer (e.g., the iPad, by Apple, Inc. of Cupertino, Calif.), or a smartphone (e.g., the iPhone, also by Apple, or those based on the Android operating system by Google Inc., of Mountain View, Calif.), and those touch screen devices and the like will be referred to herein as a “touch screen device”.
The present invention relates generally to a system and method for providing interactive overlays for video. More particularly, the invention relates to a system and method for providing in a multimedia container video with metadata to signal supported interactions to take place in an overlay layer.
The interactions and overlays may be customized and personalized for each child.
The invention makes use of multimedia comprising a video (generally with accompanying audio) and metadata that describes which interactions can occur during which portions of the video. The video and metadata may be packaged in a common multimedia container, e.g., MPEG4, which may be provided as a stream or may exist as a local or remote file.
The child may use a touch screen to interact, or in some cases the invention can employ a range of other input sensors available on the touch-screen device, such as a camera, microphone, keypad, joypad, accelerometers, compass, GPS, etc.
Tags are inserted into the metadata of an MP4 or similar video codec, which the “game” engine (application) reads to determine, sometimes in combination with data about the child stored in a remote database, which interactive overlay graphics are available during specific intervals of video content. Interactive overlay content can be further contextualized by allowing triggering of different animated graphics within a specific time segment and/or within a specific area of the screen and/or triggered via a specific input sensor.
The graphics that are generated by a child's touch can have the following behaviors:
A single type of animated graphic is generated per time segment and/or screen location, which then travels around and/or off the screen.
A single type of animated graphic is generated per time segment and/or screen location, which then fades out or dissipates in some similar manner from the screen.
A series of animated graphics, such as a series of numbers or letters of the alphabet, are generated based upon the length of the child's swipe, a skill level of the child, or prior experience of the child with a particular interaction. These animated graphics can then either fade out and/or travel.
The color of the animated graphic generated could be modified based upon the time segment and/or screen location.
The size of the animated graphic could be modified based upon the time segment and/or screen location.
The suggested interactions above and those described in detail below are by way of example, and not of limitation.
These and other aspects of the present invention will be apparent upon consideration of the following detailed description taken in conjunction with the accompanying drawings, in which like referenced characters refer to like parts throughout, and in which:
While the invention will be described and disclosed in connection with certain preferred embodiments and procedures, it is not intended to limit the invention to those specific embodiments. Rather it is intended to cover all such alternative embodiments and modifications as fall within the spirit and scope of the invention.
Referring to
CPU 101, directed by player application 102, is provided with access to multimedia container 110 comprising the video to be played and the metadata for overlay interactions (one example embodiment described in greater detail in conjunction with
For video to play, CPU 101 directs video decoder 111 to play the video from container 110. In response, video decoder 111 renders the video, frame by frame, into video plane 112. CPU 101 must also configure video display controller 130 to transfer each frame of video from the video plane 112 to the display 131.
For video to play with a graphic overlay, CPU 101 directs graphics processor 121 to an appropriate graphic overlay (e.g., an image, or graphic rendering display list, neither shown). For the present embodiment, the graphic overlay is an interactive overlay 120, known to application 102, and for which, through CPU 101, application 102 can issue interactive control instructions (e.g., by passing parameters in real time derived from input received from touchscreen 103 or sensor 104, or as a function of time, or both, thereby causing the overlay graphics to appear responsive to the input.
The output of the graphics processor is rendered into overlay plane 122. CPU 101 is further responsible for configuring video display controller 130 to composite the image data in overlay plane 122 with that in video plane 112 and present the composite image on display 131 for viewing by the user. Generally, the transparent touchscreen input device 103 physically overlays display 131, and the system is calibrated so that the positions of touch inputs on touchscreen 103 are correlated to known pixel positions in display 131.
Timecode 350 in image 231 indicates where in the current video this scene is located, in a format MM:SS:FF representing a count of minutes, seconds, and frames from the beginning of this video. Timecode would not generally be appropriate for a child user, or most audiences. Timecode is more appropriate to video production personnel and system developers. However, for the purpose of explaining the present invention, timecode 350 is shown here because of a correspondence with the example metadata in
In a similar interaction illustrated in
Again,
In
For the video shown in the examples above, there was corresponding metadata that defined which interactive graphic overlays were appropriate to which intervals within the video.
Metadata 1100 includes default touch response tag 1120, which specifies the stars interaction shown in
Between the start and end tag pairs defining each interval element, there are one or more overlay interaction elements, defined by tags 1131, 1141, 1151, 1152, 1161, and 1162.
Overlay interaction element 1131 (shown as a “touch_response” tag) specifies the smoke response of
Overlay interaction element 1141 is responsible for the counting interaction shown in
In the interval element starting with tag 1150, there are two overlay interaction elements, 1151, and 1152. These correspond to each of the pictures used to personalize the video of
Thus, in
In this embodiment, as a design decision, the caption 820 remains until the interval expires or for three seconds, whichever is longer. Another design decision is how to handle subsequent touches that may trigger other overlay interactions within the same interval element, for example, tag 1152. An implementation may choose to allow only the first interaction triggered to operate for the duration of the interval, or the choice may be to allow a subsequent trigger to cancel the prior interaction and begin a new one, or an implementation may allow multiple interactions to proceed in parallel. In another embodiment, an alternative choice of units for zones might be used, e.g., display pixels or video source pixels.
In the interval element starting with tag 1160, there are two overlay interaction elements 1161 and 1162, of which touch_response tag 1161 is responsible for the finger-painting interaction in
Additionally, the final interval in metadata 1100 includes a non-touch based overlay interaction element in the form of “blow_response” tag 1162. This embodiment would employ a microphone, one of sensors 104, and respond to the volume of noise presented to that microphone by, for example, with graphics processor 121 simulating an airbrush or air stream blowing across tool 920, which behaves as wet red paint, producing a spatter of red paint in the overlay plane 122.
The programming and resources to respond to each overlay interaction element, whether touch_response tags, blow_response tags, or a response associated with other sensors, is stored as interactive overlay 120 and can be accessed and executed by graphics processor 121 as directed by and using parameters from application 102 running on CPU 101.
In an alternative embodiment, application 102 could perform the graphics rendering and write directly to overlay plane 122. In still another embodiment, application 102 could produce all or part of a display list to be provided to graphics processor 121 instead of using programs and resources stored as interactive overlay 120. Those familiar with the art will find many implementations are feasible for the present invention.
Metadata 1100 such as that contained in XML data may be presented all together, as if data were presented at the head of a multimedia file or start of a stream, or such metadata might be spread throughout a multimedia container, for example, as subtitles and captions often are. In some embodiments, the interactive overlay metadata could appear as a stream that becomes available as the video is being played, rather than all at once, as illustrated in
At 1211, the video display controller 130, video decoder 111, and graphics processor 121, are initialized and configured as appropriate for the video in container 110 and properties of display 131 (e.g., size in pixels, bit depth, etc., in case the media needs scaling). The video decoder is directed to the multimedia file or stream (e.g. container 110) and begins to decode each frame of video into video plane 112.
At 1212, container 110 (whether a file or stream) is monitored for the presence of interactive overlay metadata. If any interactive overlay metadata is found, it is placed in the overlay metadata cache 1250. If all metadata is present at the start of the presentation, then this operation need be performed only once. Otherwise, if the metadata is being streamed (e.g., in embodiments where the overlay metadata is provide like or as timed text for subtitles and captions), then as it appears it should be collected into the overlay metadata cache.
At 1213, the current position within the video being played is monitored. Generally, this comes from a current timecode as provided by video decoder 111. At 1214, a test is made to determine whether the current position in the video playout corresponds to any interval specified in overlay metadata cache 1250. If not, then a test is made at 1215 as to whether the video has finished playing. If not, interactive overlay process 1200 continues monitoring at 1212.
If, however, at 1214, the test finds that there is an interval specified in the collected metadata, then at 1216, an appropriate trigger is set for the corresponding sensor signal or touch region. Then, at 1217, while the interval has not expired (i.e., the video has neither ended nor advanced past the end of the interval), a test is made at 1218 as to whether an appropriate sensor signal or touch has tripped the trigger. If not, then processing continues to wait for the interval to expire at 1217 or a trigger to be detected at 1218.
When, at 1218, a trigger is found to have been tripped, then at 1219 the corresponding overlay interaction is executed, whether by CPU 101 or graphics processor 121 (or both). When the interaction concludes, a check is made at 1220 as to whether the interaction is retriggerable, (that is, allowed to be triggered again within the same interval), if so, the wait for another trigger or interval expiration resumes at 1217.
Otherwise, at 1220, when the interaction may not be triggered again during the current interval, the trigger is removed at 1221, which is the same action taken after the interval is found to have ended at 1217.
Following 1221, the test 1215 for the video having finished is repeated, with the process terminating at 1222 if the video is finished playing. Otherwise, the process continues for the remainder of the video by looping back to 1212.
As with all such systems, the particular features of the user interfaces and the performance of the processes, will depend on the architecture used to implement a system of the present invention, the operating system selected, whether media is local, or remote and streamed, and the software code written. It is not necessary to describe the details of such programming to permit a person of ordinary skill in the art to implement the processes described herein, and provide code and user interfaces suitable for executing the scope of the present invention. The details of the software design and programming necessary to implement the principles of the present invention are readily understood from the description herein. Various additional modifications of the described embodiments of the invention specifically illustrated and described herein will be apparent to those skilled in the art, particularly in light of the teachings of this invention. It is intended that the invention cover all modifications and embodiments, which fall within the spirit and scope of the invention. Thus, while preferred embodiments of the present invention have been disclosed, it will be appreciated that it is not limited thereto but may be otherwise embodied within the scope of the claims.
This application claims priority to U.S. provisional application No. 61/436,494 filed Jan. 26, 2011.
Number | Date | Country | |
---|---|---|---|
61436494 | Jan 2011 | US |