Use of digital media is becoming increasingly common. In home networks, for example, devices are increasingly able to handle digital content. As a result, usage models available to home network users are becoming more sophisticated and these users are demanding more powerful capabilities to share digital content throughout the house. Ease of use is still, however, imperative to users in this home network environment.
One critical feature for any usage model in a home network environment is the ability to manipulate media content. One type of media manipulation, typically known as “trick mode”, includes the ability to manipulate content with actions such as fast forward, fast reverse, time seek, jumping to a scene in a movie, etc., in addition to normal playback. VHS and DVD users who have become used to these features expect to have some, if not all, of this functionality available to them in other usage models.
Although it is currently possible for users to seek through digital content and perform basic trick play such as fast forward and/or fast rewind, these features are far from advanced and not very user friendly. Thus, for example, a user may have difficulty seeking a particular location in a movie without having a time reference. In other words, although the user may be able to rewind back to “Hour 1, Min 4” of a movie to watch a particular scene, the user has to know that the scene of interest is at “Hour 1, Min 4” of the content. If there user merely knows that he or she would like to go back to “the exciting car chase scene”, however, there is no existing means by which a user can do so without doing a “blind seek” (i.e., blindly rewinding through the content). Most existing digital media schemes do not provide any audio/video reference points.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements, and in which:
Embodiments of the present invention provide a method, apparatus and system for generating and distributing rich digital bookmarks for digital media content navigation. Reference in the specification to “one embodiment” or “an embodiment” of the present invention means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment,” “according to one embodiment” or the like appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
DVD technology currently includes the concept of “chapter navigation” or “scene selection” which provides users with a visual time reference to search from and/or to jump to at any time during a movie. In the scenario described in the background, for example, a DVD user looking for “the exciting car chase scene” may view the scene selection menu to determine which point of the movie to rewind to in order to view the scene again. This user friendly scheme for allowing users to navigate through DVD content is a key aspect of manipulating content on DVDs. Having been exposed to, and having become familiar with, such a scheme, users today typically expect a user friendly viewing experience, in addition to the ability to perform trick mode functions.
Unfortunately, in contrast to the DVD scheme, other digital consumer content is not currently encoded with any chapter and/or scene navigation schemes. As a result, although users may use the limited trick mode capabilities available to digital content today to blindly seek a desired scene, these primitive capabilities may be frustrating to novice and sophisticated users alike. This frustration may be compounded by other factors such as streaming digital content. “Movies on demand” are a typical example of streaming digital content. Although seeming to providing viewers with a similar experience to a DVD experience, movies on demand viewers are in fact currently subject to a sub-par viewing experience. As described above, since the digital content is not encoded with any navigation scheme, users are forced to do a “blind seek” of the content they are interested in. The scenario is additionally complicated by the fact that the digital content may reside remotely, as a stand alone item on a server on a network and may be streamed to a consumer upon demand. As a result, the “blind seek” operations described above may have significantly slower responses than DVD responses because the media stream may have to be re-transported from the source (i.e., server) on every seek operation.
Many working groups and standards committee have been established to address these ease of use and interoperability issues. Standards such as “UPnP” (Universal Plug and Play), Intel Corporation's “NMPR” (“Networked Media Product Requirements”, most recently Version 2.1, 2005), and more recently, the “DLNA” (“Digital Living Network Alliance”, most recently Version 1.0, 2005) are each attempting to anticipate common usage models in the digital home and define protocols and guidelines to enable interoperability and ease of use within these models. Each standard addresses a different aspect of these issues.
UPnP, for example, deals with the communication aspects of the devices by defining standard services and associated actions that a certain device needs to implement in order to be “seen” and “talk” to other devices. As illustrated in
DLNA, on the other hand, takes interoperability one step further and defines baseline capabilities that the devices need to support to be conformant. DLNA Version 1.0 deals with only two types of devices: the DMS and the Digital Media Player (DMP). In UPnP terms, a DMP comprises CP 110 coupled to DMR 105. The communication between DMR 105 and CP 110 is therefore not defined as they can live in the same box or as a single software process or piece of hardware. Future versions of DLNA may separate DMR 105 from CP 110, similar to the current UPnP scheme, or identify new types of devices.
An embodiment of the present invention provides a method, apparatus and system for generating and distributing rich digital bookmarks to enable users to easily manipulate digital content. The following description assumes the use of a UPnP scheme but embodiments of the present invention are not so limited. Thus, for example, alternate embodiments of the present invention may be implemented wherein CP 110 and DMR 105 are one process (e.g. a DMP in DLNA terms) and/or using non-UPnP protocols. Additionally, although the following description assumes audio/video content only, embodiments of the present invention are not so limited and may be applicable to any form and/or combination of digital content.
The term “digital bookmark” is well known to those of ordinary skill in the art and typically refers to any metadata associated with media content that may be used to randomly access a certain position within the content. According to embodiments of the present invention, RDB 225 comprises digital bookmarks that include additional information and/or data. Thus, for example, in one embodiment, RDB 225 includes (i) metadata to efficiently index to a position in the video content and (ii) items associated with the seek index in (i) that will serve as a “natural” easy to understand audio-visual reference to a human interacting with the device. Examples of metadata include a byte offset from the beginning of the movie, a time-stamp associated with RDB 225, frames into the movie, and/or any combination of these. Examples of items associated with the seek index include a text caption for RDB 225, a thumbnail or image frame associated with RDB 225, an audio fragment associated with RDB 225, and/or any combination of these.
RDB 225 may be generated in a variety of ways without departing from the spirit of embodiments of the present invention. Thus, for example, in one embodiment, RDB 225 may be generated in real-time while DMR 205 is processing (decoding) a video stream. Alternatively, RDB 225 may be generated “off-line” (i.e., in advance) upon user demand and/or upon demand from CP 210 during quiet or inactivity periods.
Regardless of how RDB 225 is generated, it may be accessed in a variety of ways without departing from the spirit of embodiments of the present invention. In one embodiment, RDB 225 may be retrieved (“pulled”) from DMR 205 at any time. The process of retrieving data from DMR 205 is well known to those of ordinary skill in the art and may include various standard actions and protocols currently known and/or hereafter determined. Alternatively, RDB 225 may be dynamically distributed (“pushed”) by DMR 205 (or any other device that generates RDB 225) to other devices on Network 250. Once accessed, RDB 225 may be displayed on User Interface 240, as illustrated (“VISUAL DISPLAY OF RDB 225”).
After RDB 225 is generated, it may be distributed by and/or be stored in various ways. In one embodiment, for example. RDB 225 may be multicast on Network 150 to any devices interested in the RDB. Alternatively, RDB 225 may be unicast to CP 210 and/or uploaded from CP 210 to DMS 200 as part of Content 230. DMS 200 may then provide RDB 225 to CP 210 for User Interface 240 and/or to DMR 205 for easy time-based accessing.
In 3, a user may (via a user interface on CP 210) elect to play Content 230 and when a connection is established to DMS 200 that contains the content, CP 210 may inquire whether DMR 200 is capable of generating RDBs for that specific content. In 4, if DMR 205 is capable of generating RDBs for Content 230 (information obtained in 2 above), CP 210 may enable a menu on the user interface (i.e., CP 210 may allow the user to navigate to a “Bookmarks” or “Scene Selection” type menu). DMR 205 may continuously retrieve Content 230 from DMS 200. As new RDB's are generated for the streaming content, DMR 205 may store locally some metadata that to enable mapping RDB 225 time-stamps to indices in the movie. In one embodiment, DMR 205 may then send an event to CP 210, describing the following RDB 225 metadata: RDB Time-Stamp, RDB Caption Text, RDB Thumbnail URI location for retrieval and RDB Audio Fragment URI location for retrieval. Additional description of the metadata is provided further below.
In 5, as new portions of Content 230 are retrieved from DMS 200, the RDBs associated with that portion of the content stream may be generated and these RDB changes may be updated on the previously enabled menu on the user interface. In 6, if the user (via the user interface on CP 210) selects an RDB, CP 210 may then perform a time-based seek transport action on DMR 205 using the time-stamp for the selected bookmark. DMR 205 may then proceed to map the time-stamp to the index data it has stored locally and seek to that position in the movie.
RDB 225 may be implemented in a variety of ways without departing from the spirit of embodiments of the present invention. In one embodiment, RDB 225 may be is implemented as a new UPnP variable. Thus, for example, the UPnP variable may be in the form of a Digital Item Declaration Language (“DIDL”) Lite Standard Markup Language (“XML”) associated with the resource. More specifically, a new state variable may be added to the UPnP audio visual Transport Service (called “CurrentTrackRDB” in this example). In one embodiment, this new state variable may be an evented variable and may also be accessed using a recommended new action (called “GetCurrentTrackRDB” in this example). In the context of the sequence diagram in
In various embodiments, the time intervals for RDB 225 may be device vendor configured and/or user configurable through the user interface on CP 210. Thus, for example, one potential configuration is an RDB every 5 minutes (300 seconds). In alternate embodiments, more sophisticated time intervals may be selected, such as video pattern recognition primitives to automatically identify interesting scene breakpoints.
In one embodiment, the stream splitters and/or decoders in DMR 205 may be responsible for identifying a picture frame in the encoded bit-stream that approximates the configured time interval. Thus, for example, in one embodiment, the source content may be in an MPEG format and/or another compression format that enables index frames. According to this scheme, reference frames such as MPEG “I-Frames” may be used for random access. I-frames are typically encoded every 0.5 seconds, thus offering a ½ second granularity in the selected RDB. DMR 205 may identify the I-frame that is closer to the specified time interval and store the file byte offset as a numeric integer, the actual time as a string “HH:MM:SS” and a frame position as a numeric integer number. In one embodiment, DMR 205 may then use the time-stamp as metadata to be sent to CP 210 as part of RDB 225's XML fragment.
In one embodiment, thumbnails may be generated by converting the closer I-frame identified during the time indexing step and encoding the I-frame as a JPEG image of small resolution, e.g. conformant to DLNA's “JPEG_TN” profile. The HTTP location of the image may also be added to the RDB metadata XML fragment. Additionally, in one embodiment, DMR 205 may retrieve the first n seconds (e.g., 5 seconds) of audio after the first sample exceeding a certain magnitude to avoid silent periods. DMR 205 may then decode the audio excerpt and encodes as an mp3 file conformant to DLNA's MP3 profile. In alternate embodiments, more sophisticated DMRs or devices may perform audio processing to identify the most interesting audio fragment within the bookmark interval.
Embodiments of the present invention may be implemented on a variety of computing devices. According to an embodiment of the present invention, computing devices may include various components capable of executing instructions to accomplish an embodiment of the present invention. For example, the computing devices may include and/or be coupled to at least one machine-accessible medium. As used in this specification, a “machine” includes, but is not limited to, any computing device with one or more processors. As used in this specification, a machine-accessible medium includes any mechanism that stores and/or transmits information in any form accessible by a computing device, the machine-accessible medium including but not limited to, recordable/non-recordable media (such as read-only memory (ROM), random-access memory (RAM), magnetic disk storage media, optical storage media and flash memory devices), as well as electrical, optical, acoustical or other form of propagated signals (such as carrier waves, infrared signals and digital signals).
According to an embodiment, a computing device may include various other well-known components such as one or more processors. The processor(s) and machine-accessible media may be communicatively coupled using a bridge/memory controller, and the processor may be capable of executing instructions stored in the machine-accessible media. The bridge/memory controller may be coupled to a graphics controller, and the graphics controller may control the output of display data on a display device. The bridge/memory controller may be coupled to one or more buses. One or more of these elements may be integrated together with the processor on a single package or using multiple packages or dies. A host bus controller such as a Universal Serial Bus (“USB”) host controller may be coupled to the bus(es) and a plurality of devices may be coupled to the USB. For example, user input devices such as a keyboard and mouse may be included in the computing device for providing input data. In alternate embodiments, the host bus controller may be compatible with various other interconnect standards including PCI, PCI Express, FireWire and other such existing and future standards.
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be appreciated that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.