The subject matter of the present disclosure relates to computing systems and more particularly the use of a dispersed work forces, such as a crowd source, to provide enhancement information (e.g. closed captions) for audio/visual works.
The invention relates generally to software, apparatus and techniques to enhance the viewer experience with video or audio/video works. One example of a technique to enhance the user experience is the use of close captioning or subtitling, which allow video works to be enjoyed by a wider audience. Close captioning is generally a technique for associating text with video so that a user can selectively view the text at appropriate times during the video play. For example, a hearing disabled person may select close captioning while viewing a video in order to understand dialog or other audible content that accompanies a video. Subtitles differ somewhat from captions in that they are typically used for transliteration and are often displayed persistently through a video, without a user selection.
In order to enhance video with features such as close captioning and subtitles, machine or human intervention is required at least to create the enhancement and to align it with the appropriate portion of the video. Often the producer of professional video will supply captions or subtitles for the benefit of disabled persons or for transliteration. Notwithstanding the benefits of enhanced media, a great deal of professionally produced media lacks useful and desirable enhancements. In addition, even when a particular professional media item has one or more enhancement features, that media may lack other desirable features such as specific transliteration or other interesting information related to content of the media. Of course, outside of the area of professional media, the vast majority of existing video and audio material (e.g. YouTube or home video) is nearly completely lacking enhancement features. Thus, there is a huge amount of video and other media in the world lacking desirable enhancement features, such as subtitles and closed captioning.
In response to this situation, the concept of crowd-sourced captioning/subtitling has evolved in the marketplace. For example, KahnAcademy.com provides software tools that allow volunteers to help create dubbed video and foreign language subtitles for educational videos (www.khanacademy.org). During the summer of 2012, Netflix also began soliciting for volunteers to join its crowd sourced subtitling community. There are also other similar efforts by a variety of well know companies: BBC; NPR; Google; Facebook; and Microsoft.
Aspects of inventions discussed herein relate to the use of crowd source techniques for providing video enhancement features such as closed captions, subtitles or dubbing. Some embodiments of the invention contemplate using one or more stages of a five-stage process. In a potential first stage of an embodiment, a large number of input-users (typically volunteers) input enhancement information (e.g. captions or subtitles) that is collected by a central system or system operator. The input-users may align the enhancements with places (e.g. temporal places) in the media by use of placement guides such as cue points, which are described more fully below. The input-users may obtain cue point information from a central system or system operator and then apply that information to independently obtained version of the media being enhanced. In some embodiments, many input-users will add all types of enhancements to a media item and a central system or operator will collect all of the enhancements.
After a critical mass of enhancement information is collected by the central system, the five-stage process may move to a second stage that includes normalizing the collected data. Since the normalization task lends itself to machine work, many embodiments use server-based applications to perform normalization. However, other embodiments contemplate using crowd source techniques to perform normalization. For example, enhancements collected from input-users might be transferred in portions to a another grouping of users to perform the normalizing task through crowd sourcing.
In some embodiments, after normalization is complete, the five-stage process may enter a third stage wherein the collected and normalized data is distributed to another group of users (e.g. “editor-users”) for validation and editing. The crowd source of editor-users performs the editing and validation tasks and the results are again collected by the central system or a central operator.
After sufficient crowd-source editing takes place, the five-stage process may enter a fourth stage to curate the now normalized and edited set of data. In the fourth stage, yet another group of users (e.g. “curator-users”) organize the enhancement materials into categories or channels that may be functional (e.g. closed captions), entertaining (e.g. fun facts about the actors' lives during the shooting of the video), or otherwise desirable. For example, curator-users may create streams or channels of enhancement features where each stream or channel follows a potentially desirable theme, such as English close captions, Italian subtitles, information about actors, or any possible interesting aspect of the media content. Thus, after curating, a video may have any number of channels, each channel representing a themed collection of enhancement information available for an end-user.
A final potential stage to the five-stage process involves the publication of the enhancement information. Since the enhancement information may be organized (for purposes of temporal placement in the video) with respect to cue points, the enhancement information may be distributed to the end user independent of the video source. The cue point and enhancement information may be merged with video stream at or near the runtime of the video.
b shows other tables related to the table and topic of
b shows a table of enhancement data related to the table and topic of
I. Hardware and Software Background
The inventive embodiments described herein may have implication and use in all types of single and multi-processor computing systems. Most of the discussion herein focuses on a common computing configuration having a CPU resource including one or more microprocessors. The discussion is only for illustration and not intended to confine the application of the invention to the disclosed hardware. Other systems having either other known or common hardware configurations are fully contemplated and expected. With that caveat, a typical hardware and software operating environment is discussed below.
Referring to
Processor 105 may execute instructions necessary to carry out or control the operation of many functions performed by device 100 (e.g., such as the generation and/or processing of media enhancements). In general, many of the functions performed herein are based upon a microprocessor acting upon software embodying the function. Processor 105 may, for instance, drive display 110 and receive user input from user interface 115. User interface 115 can take a variety of forms, such as a button, keypad, dial, a click wheel, keyboard, display screen and/or a touch screen, or even a microphone or video camera to capture and interpret input sound/voice or video. The user interface 115 may capture user input for any purpose including for use as enhancements in accordance with the teachings herein.
Processor 105 may be a system-on-chip such as those found in mobile devices and include a dedicated graphics processing unit (GPU). Processor 105 may be based on reduced instruction-set computer (RISC) or complex instruction-set computer (CISC) architectures or any other suitable architecture and may include one or more processing cores. Graphics hardware 120 may be special purpose computational hardware for processing graphics and/or assisting processor 105 process graphics information. In one embodiment, graphics hardware 120 may include a programmable graphics processing unit (GPU).
Sensor and camera circuitry 150 may capture still and video images that may be processed to generate images for any purpose including for use as enhancements in accordance with the teachings herein. Output from camera circuitry 150 may be processed, at least in part, by video codec(s) 155 and/or processor 105 and/or graphics hardware 120, and/or a dedicated image processing unit incorporated within circuitry 150. Images so captured may be stored in memory 160 and/or storage 165. Memory 160 may include one or more different types of media used by processor 105, graphics hardware 120, and image capture circuitry 150 to perform device functions. For example, memory 160 may include memory cache, read-only memory (ROM), and/or random access memory (RAM). Storage 165 may store media (e.g., audio, image and video files), computer program instructions or software including database applications, preference information, device profile information, and any other suitable data. Storage 165 may include one more non-transitory storage mediums including, for example, magnetic disks (fixed, floppy, and removable) and tape, optical media such as CD-ROMs and digital video disks (DVDs), and semiconductor memory devices such as Electrically Programmable Read-Only Memory (EPROM), and Electrically Erasable Programmable Read-Only Memory (EEPROM). Memory 160 and storage 165 may be used to retain computer program instructions or code organized into one or more modules and written in any desired computer programming language. When executed by, for example, processor 105 such computer program code may implement one or more of the method steps or functions described herein.
Referring now to
Also coupled to networks 205, and/or data server computers 210, are client computers 215 (i.e., 215a, 215b and 215c), which may take the form of any computer, set top box, entertainment device, communications device or intelligent machine, including embedded systems. In some embodiments, users such as input-users, curator-users, editor-users and end-users will employ client computers. Also, in some embodiments, network architecture 210 may also include network printers such as printer 220 and storage systems such as 225, which may be used to store enhancements (including multi-media items) that are referenced in databases discussed herein. To facilitate communication between different network devices (e.g., data servers 210, end-user computers 215, network printer 220 and storage system 225), at least one gateway or router 230 may be optionally coupled there between. Furthermore, in order to facilitate such communication, each device employing the network may comprise a network adapter. For example, if an Ethernet network is desired for communication, each participating device must have an Ethernet adapter or embedded Ethernet capable ICs. Further, the devices must carry network adapters for any network in which they will participate.
As noted above, embodiments of the inventions disclosed herein include software. As such, a general description of common computing software architecture is provided as expressed in layer diagrams of
With those caveats regarding software, referring to
No limitation is intended by these hardware and software descriptions and the varying embodiments of the inventions herein may include any manner of computing device such as Macs, PCs, PDAs, phones, servers or even embedded systems.
II. A Multi-Stage Crowd Source System
Some embodiments discussed herein refer to a multi-stage system and methodology to employ crowd-sourcing techniques for the purpose of creating, refining and ultimately using video enhancement features. For example, a system may collect video captions through crowd sourcing and then distribute the collected captions to volunteer users for further refining and categorization. In this manner, one or more channels of enhancement information may be created by crowd sourcing and applied to media, resulting in products like enhanced video.
Referring again to
Referring once again to
Referring yet again to
Finally, referring again to
Having this overview, each stage will now be explained in further detail.
III. User Input
During the user input stage, enhancement information is collected from multiple users (“input-user”), each of whom presumably views/experiences at least portions of the subject media and “enters” information. An input-user at the input stage may use any conventional device to enter enhancement information. Of course, since many conventional devices provide few or no mechanisms for entry of enhancement information, a conventional device may require the addition of supplementary technology. For example, an input-user may employ a traditional software video player that is supplemented through a simple software update, software plugin or accessory software. Some common traditional video players, which may be supplemented might include Apple's QuickTime player, Microsoft's Windows Media Player, or browser based video players. Some embodiments also contemplate the use of legacy hardware video viewing devices, which may be supplemented for use with embodiments of the invention. For example, many modern televisions and set top boxes may receive plugin or accessory software to add functionality. Furthermore, any legacy video device might be supplemented by use of accessory hardware that connects in serial with the legacy device and provides for a user interface and infrastructure for collecting, editing or curating video enhancement information. In the case of such an accessory hardware device, one embodiment envisions the use of a set top box serially in-line between the video source and the display, such that the accessory may impose a user interface over and/or adjacent to the video.
Of course, an input-user (or other user) may enter enhancement information using a device or software that is made or designed with enhancement entry as a feature. In that event, supplementation may not be necessary.
One example embodiment of a media player for use in collecting enhancement information is show as item 500 in
Referring again to section 520, the diagram illustrates potential types of input fields or icons or widgets that might be used by an input-user to enter enhancement information. For example, in one embodiment, item 504 is a text entry box wherein an input-user may directly type enhancement information such as a caption, subtitle or other information. The input-user might then use one or more of items 505, 506 or 507 to indicate (either textually, by menu, button or any other known mechanism) the nature of the information entered in box 504. For example, the user may indicate that the entered information is a caption, entered in English, or alternatively, a URL that is linked to contextual information about something in the media content. As disclosed later, any context (e.g. metadata) provided by the input-user regarding a submitted media enhancement may be stored in a database and used in later stages of the process. In other embodiments, one or more of the widgets or icons (e.g. items 505, 506 or 507) may be drop zones for multimedia items or text items that were produced using a different software tool or different computer. As with the case of text entry, in connection with using a drop zone, the user may use widgets to indicate meta-information such as the nature and/or relevance of the item being dropped. Varying embodiments of the invention contemplate entry of enhancement information by any mechanism possible now or in the future. A further and non-exhaustive list of examples is as follows: a user may employ a pointer to select a arbitrary spot on the display for data entry either during video play or otherwise; a user may enter information through voice recognition either by simply speaking or by using a widget/icon to indicate the insertion of speech; or a user may enter information through use of the sensors in the client device, for example audio or video enhancement through a microphone or camera or any information that a device sensor may obtain. Of course, any combination of the foregoing is also contemplated.
In some embodiments, as enhancement information is input by a user, the information is saved in a memory such as any of the memory discussed in connection
Furthermore, as will become evident, the enhancement information receivable by the system may be of multiple types or formats and may relate to multiple categories of information. Also, each item of enhancement information may be tied to a place (e.g. a temporal point) in the media (e.g. video). Therefore, with respect to any particular item of enhancement information, there may be a few or many related data items (e.g. metadata), such as: temporal entry point; type of data; category of data; user identification; user location; type of device or software employed by user; time of information entry; comments from the user; and, any other information inferred from the user's action or expressly entered by the user. Given the breadth of information that may relate to every item of enhanced information, some embodiments employ one or more databases to centrally organize the metadata and relate it to the entered enhancement information.
According to some embodiments of the invention, different users may seek to provide enhancement for the same video title (e.g. “The Sound Of Music”), however each user may obtain a version of the video title from a different source. For example, if four input-users in a crowd source group are attempting to provide English caption information for “The Sound Of Music,” the first user may obtain the movie from HULU, the second user from Netflix, the third user from iTunes and the fourth user from broadcast TV. Using any of the input techniques discussed above, each user might choose a different place in the video (e.g. a temporal point) to place the same caption information. Similarly, for any given span of video, one user may choose to put a large amount of caption information in each of a few places, while another user may chose to put a small amount of caption information in each of several places—the total of information potentially being roughly the same for both users. As a result of either of the foregoing situations, any later effort to organize or reconcile the inputs from multiple users will be complicated by the users' randomly selected and variably numerous insertion points. Therefore some embodiments of the invention contemplate the use of cue points.
Cue points are relatively specific places in media (e.g. a video) that are intended to be generally consistent across varying versions of the same video title. The cue points may be placed by any mechanism that provides for consistency among video versions. Some embodiments use cue points that are specific points in a timeline of the video. Other embodiments align cue points with other content addressable features of a video or with meta-information included with the video. In order to achieve consistent cue points across multiple video versions (of the same video title), some embodiments provide cue points that are evenly temporally spaced between identifiable portions of the video, like the beginning, end or chapter markers. For example, some embodiments may use cue points every 60 seconds from beginning to end of the movie or from beginning to end of each chapter. Other embodiments place cue points relative to scene changes or camera angle changes in the video, which may be automatically detected or identified by human users. For example, some embodiments may place a cue point at every camera angle change. Still other embodiments may evenly temporally displace a fixed number of cue points, where the fixed number depends upon the video title's length and/or genre and/or other editorial or meta information about the video. Finally, cue points may be placed by any combination of the foregoing techniques. For example, there may be a cue point placed at each scene change and, in addition, if there is no subsequent scene change within a fixed amount of time (e.g. 60 seconds), another cue point will be inserted.
In some embodiments, the cue point related information for a particular media title is independent from any particular version of the media. In other words, for any particular video title (e.g. The Sound Of Music), the cue point information (e.g. identity, and/or nature, and/or spacing of cue points) is independent and separable from the video versions (e.g. obtained from HULU, or obtained from iTunes, or obtained from NetFlix, etc.). By this feature, the cue point information may be applied to any version of the media title. For example, the cue point information for a particular video title may be applied to video versions sourced from Netflix, Hulu and iTunes (all theoretically slightly different versions in form, but not in substance). In addition, enhancement information may be aligned with cue points rather than directly with markers embedded in the video media. In this manner, enhancement features may be maintained and distributed independent of the video media or version it represents. The independence provided by the cue point embodiments allows a central system (e.g. server resources) to accumulate, process and maintain cue point and enhancement information in a logical space separate from video media and crowd source user activity.
For the benefit of a more complete illustration, the following section describes exemplary embodiments for inputting enhancement information. While the discussion may recite a sequence and at times semantically enforce that sequence, the inventors intend no sequential limitation on the invention other than those that are strictly functionally necessary or expressly stated as essential.
In an initial step of the input stage, an input-user may select a suitably equipped video player, or alternatively select any video player and apply an appropriate supplement to make the player suitable. The input-user may also select a video and a source for the video, for example, “The Sound Of Music” from iTunes. In some embodiments, the input-user may first select a video or source and during the process receive a notification regarding the opportunity to contribute to a crowd source enhancement of the video. If the user accepts the opportunity, an appropriate video player may be provided or supplemented after the selection of the video or source. The player or supplement (e.g. software modules) may be downloaded from a server over a LAN or WAN such as the Internet. Once an input-user is equipped with a suitable video player and video media, the input-user may use normal media controls (depending upon the viewing device, play, FF, RW, pause, etc.) to view the video. At any point where the input-user is inspired to enter enhancement information, there are several possibilities for doing so: the input-user may simply act to enter an enhancement using a pointer, touch or other interface device on the video; the user may pause the video using the normal control and insert the enhancement using a provided interface such as those shown in
In some embodiments, during video play, the video player may prompt the user regarding the opportunity to enter enhancement information. The prompts may be based upon any of the following or any combination thereof: the placement of cue points in the video; the relative amount of video played (in time, frames or otherwise) since the last enhancement was entered; scene changes; camera angle changes; or, and content addressed features or meta information.
Regardless of the mechanism for indicating an insertion, after an insertion has been indicated, some embodiments provide a visual indication of the nearest cue point. For example, upon the user's indication that an insertion is desired, the video may pause and the user may be shown the nearest cue point. The cue point may be shown by any of the following: upon indication of a desired insertion, the video may automatically move the nearest cue point and display a temporally accurate still image of the playing video at that point; a relatively small windowed still frame of the video at the cue point may be shown on the display in addition the relatively larger still frame of the playing video at the arbitrary point where the insertion was indicated; a brief video sequence in a relatively small framed window similar to the foregoing; an indication on a time line exposing the location of the cue point relative to a paused place in the video where insertion was indicated; or any combination of the foregoing techniques, wherein for example, a relatively small windowed still frame is shown above the timeline indication and the paused video is shown simultaneously in the main display. Furthermore, using some of the techniques discussed here (e.g. relatively small windowed frames and/or timeline indicators) the interface may visually expose multiple cue points either simultaneously or serially when the play head point is in proximity to the cue point. Moreover, whether or not multiple cue points are simultaneously displayed, the user may select between cue points by use of one or more interface control (e.g. pointer, icons or widgets). For example, the user may examine the video for appropriate cue points by moving forward or backward through sequential cue points. In the case of multiple cue points simultaneously displayed, the user may directly select a desired cue point. In some embodiments, the user may insert the enhancement information either before or after selection of the cue point and the appropriately programmed software will align the two.
The insertion of enhancement information may take any form discussed above or otherwise known. Varying embodiments of the invention provide visual feedback of the insertion information. Thus, when a user types in a caption, the text may remain visible for a period of time either in the insertion widget or otherwise on the screen (e.g. aligned with a timeline indicator). As discussed above, some embodiments of the invention contemplate non-text enhancements and for such items a special preview window may be useful for the user. Upon using non-text enhancement information, some embodiments provide preview information in a window either side-by-side or overlapping (e.g. picture-in-picture style) with the playing video.
Given the nature of media enhancements such as captioning and subtitles, cue points may be numerous and somewhat close together. This situation suggests that users may not provide content for every cue point. Furthermore, when multiple users provide an enhancement like a caption for the same video sequence, the varying users may not select the same cue point. Therefore, if a networked central system collects enhancement information regarding “The Sound of Music” from several different input-users, the collection of information may be a sparse and intermittently inaccurate as illustrated in
Referring to
IV. Normalization
As discussed above, varying embodiments contemplate a normalization stage. In computer science, normalization generally refers to the elimination of redundancies and dependencies in data. During the normalization stage the system may employ various techniques to eliminate redundancies and dependencies in the collected information. Referring again to
In some embodiments, the system eliminates redundancies on a cue point basis and leaves alignment (e.g. sequential dependency) issues for resolution at a later stage. For these embodiments, the result of normalization will yield table 660 shown in
The foregoing normalization examples are relatively simple because they deal with only one type of enhancement information, namely caption data. As discussed earlier, embodiments of the invention contemplate the use of multiple, many or even infinite categories of enhancement data. The following are some examples:
1. Closed Captions (where each language translation may form another category);
2. Subtitles (where each language translation may form another category);
3. Dubbing information (where each language translation may form another category);
4. Historical context information (links, text, image, video and/or audio, each of which may form a different category);
5. Character context information (links, text, image, video and/or audio, each of which may form a different category);
6. Actor context information (links, text, image, video and/or audio, each of which may form a different category);
7. Context information regarding items in the video (links, text, image, video and/or audio, each of which may form a different category);
8. Context information regarding geography and/or locations related to the video (links, text, image, video and/or audio, each of which may form a different category);
9. Context information regarding salable products in the video (links, text, image, video and/or audio, each of which may form a different category);
10. Advertising information related to aspects of the video, where each aspect may be a different category (links, text, image, video and/or audio, each of which may form a different category);
11. Identification of product placements and/or supplementary information regarding placed products (links, text, image, video and/or audio, each of which may form a different category);
12. Product or item information, such as user manuals, technical tutorials etc. (links, text, image, video and/or audio, each of which may form a different category);
13. Educational information related to aspects of the video, where each aspect may be a different category (links, text, image, video and/or audio, each of which may form a different category);
14. Editorial comment information related to aspects of the video, where each aspect may be a different category (links, text, image, video and/or audio, each of which may form a different category); and
15. The replication of DVD bonus features.
Referring now to
In some embodiments, by applying a normalization process to table 700, the system will eliminate redundancies and result in table 710 shown in
V. Validating and Editing Cue Point and Enhancement Information
Certain embodiments employ a validating and editing stage to perform a content editing and policing function well known in the art of crowd sourcing (e.g. Wikipedia). Generally, the editor-users performing validation and editing will correct errors or alter enhancements to improve the published product and police its integrity against sloppy, malicious or abusive participation by others.
Validation and editing users (e.g. editor-user) may be the same or different users from the input-users that provide enhancement entries. Notably, in some embodiments, user identities (or pseudo-identities) are persistently related to enhancement data so that the same user would not be assigned to both enter and edit/validate the same data.
As discussed above with reference to
When employing the validation and editing stage, normalized cue point and enhancement data is distributed to identified and selected editor-users. The normalized data may be distributed to editor-users using several different methodologies. For example, in various embodiments, one or more of the following techniques may be employed in the distribution of enhancement data to editor-users: attempt to prevent a particular editor-user from reviewing enhancement information that was entered by that same person or machine; attempt to provide an editor-user with enhancement information in a language spoken by the editor; attempt to provide each editor-user only with enhancements for which edit and validation does not require any device features that are unavailable for the editor-user; provide the editor-user enhancement information according to the preferences of the editor-user; provide the editor-user enhancement information according to the profile of the editor-user; provide the editor-user enhancement information according to the know abilities and/or disabilities of the editor-user; an editor-user is sent all available cue point and enhancement information; an editor-user is sent cue point and enhancement information that is most desired for editing and/or completion by the system operator (e.g. the server owner/operator); an editor-user is sent cue point and enhancement information that is based upon ratings or comments from the end-user base that employs the enhancements; an editor-user is sent cue point and enhancement information that is based upon ratings or comments from other volunteers in the crowd source community; the editor-user is sent cue point and enhancement information based upon an assessment of the system operator (e.g. server owner/operator) or system software regarding which portions of the subject video are least prepared for publication; an editor-user is sent cue point and enhancement information based upon the number of available editor-users and/or the length of time before scheduled or desired publication; an editor-user is sent cue point and enhancement information based upon the nature of the particular implementation of the overall captioning system; an editor-user is sent cue point and enhancement information based upon the size of the audience interested in a particular video; or, an editor-user is sent cue point and enhancement information based upon the size of the community of user-editors with appropriate expertise or ability to properly edit/validate the material.
In one embodiment, when a potential editor-user is experiencing the media (e.g. watching a video) and wishes to perform editing/validation, the user indicates her desire and, a subset of the available cue point and enhancement information is selected randomly or quasi-randomly for distribution to the user for editing/validation. Any supplementary software may also be sent to the user to facilitate the contemplated editing. Since the user in this case may have already watched a portion of the video, one embodiment allows for supplementing the video with cue point and enhancement information forward from the point currently displayed to the user. Notably, a purpose of this embodiment is not to force the user to watch the video from the beginning or cause the video to shift its play location. This purpose, of course, may be suited even if the entire video is supplemented with cue point and enhancement information.
Furthermore, in distributing cue point and enhancement information to the user-editors, some embodiments are careful not to cause a collision at any cue points. Strictly interpreted, a collision occurs when two or more items of enhancement information are aligned with the same cue point. A reason some embodiments avoid a strict collision is that the user-editor may not be able to decipher multiple enhancements simultaneously. Some embodiments only avoid collisions by preventing multiple enhancements per cue point if the multiple enhancements are not sufficiently complementary (so that they may not be simultaneously critically viewed or experienced).
In some embodiments, during the editing/validation stage, certain designated or all editor-users may be permitted to do one or more of the following tasks in terms of editing: make minor edits in textual content such as fixing typos; edit or delete clearly inappropriate content apparently entered by a malicious or intentionally harmful community member; crop or otherwise edit images, video clips, or audio; edit any enhancement information in any manner known for editing that content; flag content to indicate issues such as profanity, political, religious, commercial, or other content that viewers may want to filter out; flag content that requires the attention of the system operator (e.g. captioning system operator) due, for example, to corruption or error; or, move cue points to better align enhancements with the video.
In some embodiments, edited cue point and enhancement information is collected by a server for further use toward publication of an enhanced video. One embodiment provides for incorporating the edited cue point and enhancement information in the same database or a related database as the information exemplified by
VI. Curating the Content
In some embodiments, a curating stage is employed prior to publication of the cue point and enhancement information. This may be performed after the material has been validated, edited and flagged as discussed above.
One benefit of curating the cue point and enhancement information is the opportunity to make enhancement channels that may be based upon the categories of enhancement. For example, if there were enough English and Spanish speaking users entering and editing enhancements, the curating process may be able to form an English closed caption channel and a Spanish subtitle channel. Given the breadth of enhancement information contemplated by varying embodiments of the invention, any number of useful and interesting channels may be derived during the curating process.
In some embodiments, curator-users are a designated group of users that may or may not overlap with input-users and/or editor-users. Some of the embodiments, however, call for curator-users to be professionals or to have greatest trust criteria or to be the most trusted volunteer users in the community. Once designated and/or properly authorized, a curator-user obtains cue point and enhancement data, for example the data represented by table 800 (i.e. edited data). The curator-user may obtain all available cue point and enhancement data or a subset selected by the curator-user or the system operator (e.g. server or service owner/operator). For example, if the editor-user intends only to curate a channel of Spanish subtitles, she may be sent or request only enhancement data comprising Spanish subtitles. If she wishes to be more creative, she may request all Spanish language enhancement data. In terms of the ability to supply or request certain enhancement data, the system may be limited by the description information in its possession for describing the enhancement content. This type of information (i.e. metadata about enhancement content) may be obtained from the input-user, the editor-user or by application of technology to the enhancement information (e.g. text, speech, song, image, face or other object recognition technologies). For example, the more information collected from an input-user through the input interface, the easier the curators job may be.
The curator-user employs the data along with a suitable user interface to assemble logical or aesthetically interesting data channels. While the system operator may provide some rules or guidelines to the curator-user, the role is largely editorial. One curator-editor may produce a channel comprised entirely foreign language subtitles. Another curator-user may select what she considers to be the best of commentary on the symbolism in a movie and populate a channel therewith. Yet another curator-user may populate a channel with selected interesting biographical facts and multimedia artifacts relating to one or more actors in the video. In some embodiments, the curator-user may have even more editorial flexibility such as the full capabilities of an editor-user and/or an input-user. In short, depending upon the embodiment the curator-editor may be given all possible editorial control over the cue point and enhancement information.
Referring again to
In addition, while not shown in the example, there is not a strict or technical prohibition against aligning two enhancements with a single cue point in a single channel. This situation could create an aesthetically unpleasant result if assembled accidentally or without care. However, it could also be aesthetically beneficial in certain circumstances such as potentially placing text over video or placing sound with still images.
After the curator-user completes any portion of the curating task, the resulting information may be transferred back to a server/database that may be the same as the servers and databases discussed above or a related server or database.
VII. Publishing
For many embodiments, after a set of cue point and enhancement information is curated, the next stage may be publishing. The publication of one or more channels from a set of cue point and enhancement information (for a particular media title) does not necessary end the potentially ongoing activity of accepting input from input-users, and/or accepting edits from editor-users and/or accepting curated channels from curator-users. For any particular media title, one or more of the five stages described herein may continue indefinitely to the extent desired.
Many embodiments of the invention publish channels of information by making the curated cue point and enhancement information for those channels available over a network (e.g. the Internet) to media player operated by end-users. In one or more embodiment examples, the source of video to a video player is independent of the source of cue point and enhancement information. The player may be preconfigured to obtain available cue point and enhancement information or the end-user may indicate a desire for enhancement channels. In either event, a server in possession of information regarding the available cue point and enhancement information may identify the video to be played by the end user and make any corresponding enhancement channels available to the end-user through the video player interface or otherwise. The available channels may be selectable by the user from any type of known interface, including the following: a text list of available channels; icons representing each available channel; interface elements representing each available channel and embodying a preview of the channels' contents, such as an image. Furthermore, given the crowd sourced nature of the channels, the interface may include scoring or rating information to advise the user regarding the quality or desirability of the enhancement channel. For example, the channel may be scored for accuracy, completeness, entertainment value, quality or any objective or subjective criteria. Furthermore, the source of scoring or rating information may be the crowd source contributors or the end users or both. If score and rating is obtained from multiple groups of users (e.g. end-users, input-users, editor/users, and curator-users) the invention contemplates that ratings or scores may be displayed independently for each group or any combination of groups. For example, a curator's ratings might be particularly useful regarding the completeness or accuracy of a channel.
In some embodiments, depending upon the capabilities of the end-user's player device, the end user may select one or more channels for use at any one time. For example, in some embodiments, the interface will only present channels for selection if the end user's machine/software has the capability to use the channel. While users may commonly simply select only one channel from among close captioning, foreign language dubbing or foreign language subtitles, the invention contemplates that several may be selected simultaneously according to the ability of the player. For example, an end user may select Spanish dubbing and Spanish subtitles. Further, with the use of a proper multi-window interface, the same end user may also simultaneously select several image-based channels such as product information, actor information, maps of related geographies, etc. For example, with or without a multi-window interface, enhancements may complement a video in any number of known ways including the following: dividing the video display into two or more segments (e.g ⅓ and ⅔ horizontally or vertically); opaquely overlay a portion of the video; translucently or transparently overlay a portion of the video; appear in software windows or hardware screens adjacent to the video; play through the same speakers as the video; or play through separate speakers from the video.
In one embodiment, the interface for user selection of available channels may suggest to the end user combinations of channels that are appropriate for simultaneous use. In addition, given the advertising abilities of the system disclosed herein, a user may receive compensation for employing an advertising-related channel during the play of the video. For example, the user may receive free or discounted access to the video or the user may acquire point/value in a loyalty program that can later be exchanged for tangible valuables.
While many embodiments provide for enhancement information to be independent of the video, other embodiments allow for embedding enhancement information with videos by any known mechanism. Therefore, for example, DVD or online video downloads may have enhancement information embedded.
VIII. Interactive Walk Through Five Stages
Having described a variety of embodiments and features of the instant inventions, a practical review of the five described stages will now be provided. With reference to
Item 950 is a server intended to represent a server infrastructure including storage that may comprise multiple servers and databases networked together over LANs, WANs or using other connection technology. The server 950 is managed and/or its operation relating to embodiments of the invention is controlled by a system operator or service provider who may or may not be the owner of the server(s) and other equipment.
The disclosed processes of creating enhanced video or facilitating the creation of enhanced media may entail several interactions between server 950 and persons performing work toward the creation of the enhanced media. The server 950 and its included databases may be employed to retain information about the interactions and the devices, software and persons involved in the interactions. Essentially, any information about the process or a person, software, device, or the actions of a person (e.g. edits) may be stored in the server 950 and related to other associated information.
Referring now to step 960, using server 950 or another computer, cue point information is developed for one or more videos and stored. In an exemplary process, digitized video information is loaded into the computer memory where it is evaluated or operated upon by the application of software with a CPU; the result of the evaluation and operations being the creation of cue point information for the video.
Referring now to the transition element 961, upon request or indication from a input-user or her device, some or all of the cue point information is transferred to the input/editor, who may be selected by the system operator from the group of input-users 901. The input-user provides enhancement information as discussed above and the results are returned to server 950 at transition step 962. The steps 961 through 962 may be repeated numerous times to produce a critical mass of enhancement information related to the media and received and stored by server 950. As discussed above, server 950 may employ one or more relational databases and drive arrays to organize and retain information about the ongoing process such as cue point and enhancement information for a media title.
Once the system operator or system software determines there is sufficient enhancement information for a given media title, the server 950 may normalize the data at step 963 and as explained above. In some embodiments, the practical operation of normalization involves loading cue point and/or enhancement information into memory and applying software to a CPU in order to determine the similarity between different input-user entries and to evaluate relationships between the multiple entries or between the entries and the cue points.
Having a normalized set of cue point and enhancement information for a media title, the server 950 may receive a request or notification from one or more editor-users or their devices. In response or on its own programmed initiative, server 950 may forward portions (including the entirety) of the information set to editor-users selected from the editor group of editor-users 902. The editor-users edit cue point and enhancement information and return the results 965 to the server 950 where the results are received and the database or other storage updated 966.
Upon request or notification from any curator-users or their devices, server 950 may forward portions of edited cue point and enhancement information to one or more curator-users 902. The curator-users curate the information essentially preparing it for publication and return the results 968 to server 950 where the results are received and the database or other storage updated 969. Upon the interaction of software with a CPU, server 950 may further process the curated information in final preparation for publication.
One or more end users 904 may obtain media from any source, whether related to the central system or completely independent thereof. For example, a user may obtain video from YouTube or Netflix while Apple Inc. may act as the system operator and create enhancement information through its iTunes community. The end user 904 or her video player may notify server 950 regarding the identity of a media title, and server 950 may respond by providing cue point and enhancement information that the end user's device and software may associate with the independently acquired video. In this manner, the end user may receive the benefit of an enhanced video.
The discussions herein are intended for illustration and not limitation regarding the concepts disclosed. Unless expressly stated as such, none of the foregoing comments are intended and unequivocal statements limiting the meaning of any known term or the application of any concept.