Digital video signals are commonly characterized by parameters including i) resolution (e.g. luma and chroma resolution or horizontal and vertical pixel dimensions), ii) frame rate, and iii) dynamic range or bit depth (e.g. bits per pixel). The resolution of digital video signals has increased from Standard Definition (SD) through 8K-Ultra High Definition (UHD). The other digital video signal parameters have also improved, with frame rate increasing from 30 frames per second (fps) up to 240 fps and bit depth increasing from 8 bit to 12 bit. To transmit a digital video signal over a network, MPEG/ITU standardized video compression has undergone several generations of successive improvements in compression efficiency, including MPEG2, MPEG4 part 2, MPEG-4 part 10/H.264, and HEVC/H.265. The technology to display the digital video signals on a consumer device, such as a television or mobile phone, has also increased correspondingly.
Consumers requesting higher quality digital video on network-connected devices face bandwidth constraints from video content delivery networks. In an effort to mitigate the effects of bandwidth constraints, several solutions have emerged. Video content is initially captured at a higher resolution, frame rate, and dynamic range than will be used for distribution. For example, a 4:2:2, 10 bit HD video content is often down-resolved to a 4:2:0, 8 bit format for distribution. The digital video is encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities. The digital video is encoded and stored at multiple resolutions at a server. Adaptive bit rate (ABR) further addresses network congestion. In ABR, a digital video is encoded at multiple bit rates (e.g.: choosing the same or multiple lower resolutions, lower frame rates, etc.) and these alternate versions at different bit rates are made available at a server. The client device may request a different bit rate version of the video content for consumption at periodic intervals based on the client's calculated available network bandwidth or local computing resources.
Zoom coding provides an ability to track objects of interest in a video, providing the user with the opportunity to track and view those objects at the highest available resolution (e.g., at the original capture resolution). Zoom coding provides this ability on a user's request for alternative stream delivery. In general, in addition to creating the Adaptive Bit Rate streams in a standard ABR delivery system, zoom coding allows creation of streams that track specific objects of interest at a high resolution (e.g. at a resolution higher than a normal viewing resolution of the video content).
Described embodiments relate to systems and methods for displaying information regarding what objects are available to be tracked (e.g. in the form of a zoom coded stream) and for receiving user input selecting the object or objects to be tracked.
A headend encoder creates zoom coded streams based on a determination of what objects a viewer should be able to track. The determination may be made automatically or may be based on human selection. In some embodiments, the availability of trackable objects is signaled to a client using out-of-band mechanisms. Systems and methods disclosed herein enable a client that has received such information on trackable objects to inform the end user as to what objects may be tracked. In some embodiments, this information is provided visually. Embodiments described herein provide techniques for displaying to an end user the available choices of objects. Users may select an available trackable object (e.g. using a cursor or other selection mechanism), which leads the client to retrieve the appropriate zoom coded stream from the server.
One embodiment takes the form of a method, the method including: receiving, from a content server, a first representation of a video stream and an object-of-information identifier, the object-of-information identifier indicating availability of a second representation of a portion of the video stream that depicts an object of interest; causing the display of both the first representation of the video stream and the object-of-interest identifier; responsive to a user selection of the second representation of the portion of the video stream, transmitting, to the content server, a request for the second representation of the portion of the video stream; receiving the second representation of the portion of the video stream; and causing display of the second representation of the portion of the video stream.
A more detailed understanding may be had from the following description, presented by way of example in conjunction with the accompanying figures. The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification and serve to further illustrate embodiments of concepts that include the claimed invention, and explain various principles and advantages of those embodiments.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
The system and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
One embodiment takes the form of a method that includes receiving, from a content server, a first representation of a video stream and an object-of-interest identifier, the object-of-interest identifier indicating availability of a second representation of a portion of the video stream that depicts an object of interest (e.g. an enhanced view of an object of interest); causing the display of both the first representation of the video stream and the object-of-interest identifier; responsive to a selection of the second representation of the portion of the video stream using the object-of-interest identifier, transmitting, to the content server, a request for the second representation of the portion of the video stream; receiving the second representation of the portion of the video stream; and causing display of the second representation of the portion of the video stream.
Another embodiment takes the form of a system that includes a communication interface, a processor, and data storage containing instructions executable by the processor for carrying out at least the functions described in the preceding paragraph.
In at least one embodiment, the portion of the video stream that depicts an object of interest is an enlarged portion of the video stream.
In at least one embodiment, the object of interest is a tracked object in the video stream.
In at least one embodiment, causing the display of the object-of-interest identifier comprises displaying a rectangle bounding the portion of the video stream overlaid on the first representation of the video stream.
In at least one embodiment, causing the display of the object-of-interest identifier comprises displaying text descriptive of the object of interest. In one such embodiment, the object of interest is a person and the descriptive text is a name of the person.
In at least one embodiment, causing the display of the object-of-interest identifier comprises displaying a still image of the object of interest.
In at least one embodiment, the method further includes displaying a digit in proximity to the object-of-interest identifier and wherein the user selection comprises detecting the digit being selected in a user interface.
In at least one embodiment, causing the display of the object-of-interest identifier comprises displaying a timeline that indicates times during the video stream that the second representation of the portion of the video stream is available.
In at least one embodiment, causing the display of the object-of-interest identifier comprises displaying the object-of-interest identifier in a sidebar menu.
In at least one embodiment, the object-of-interest identifier is received in a manifest file.
In at least one embodiment, the first representation of the video stream is at a first bit-rate and the second representation of the portion of the video stream is at a second bit-rate different from the first bit-rate.
In at least one embodiment, the video stream is a pre-recorded video stream.
In at least one embodiment, the representations of the video streams are displayed on a device selected from the group consisting of: a television, a smart phone screen, a computer monitor, a wearable device screen, and a tablet screen.
In at least one embodiment, the timeline displays indication of availability of second representations of the portions of the video stream for at least two different objects of interest, wherein the indication of availability of each different object of interest is indicated by a different color.
In at least one embodiment, the timeline comprises a stacked timeline having multiple rows, each row in the multiple rows corresponds to a different tracked object for which a second representation is available.
In at least one embodiment, the selection comprises a desired playback time along the timeline, and causing display of the second representation of the portion of the video stream comprises displaying the second representation at the desired playback time.
In at least one embodiment, the selection is a user selection of the second representation.
In at least one embodiment, the selection is an automatic selection by the client device based on previously obtained user preferences.
A detailed description of illustrative embodiments will now be provided with reference to the various figures. Although this description provides detailed examples of possible implementations, it should be noted that the provided details are intended to be by way of example and in no way limit the scope of the application. The systems and methods relating to video compression may be used with the wired and wireless communication systems described with respect to
As shown in
The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the client devices 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. The client devices may be different wireless transmit/receive units (WTRU). By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, and the like. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.
The base stations 114a, 114b may communicate with one or more of the client devices 102a, 102b, 102c, and 102d over an air interface 115/116/117, or communication link 119, which may be any suitable wired or wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, and the like). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).
More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel-access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the client devices 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).
In another embodiment, the base station 114a and the client devices 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).
In other embodiments, the base station 114a and the client devices 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1×, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
The base station 114b in
The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the client devices 102a, 102b, 102c, 102d. As examples, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and the like, and/or perform high-level security functions, such as user authentication. Although not shown in
The core network 106/107/109 may also serve as a gateway for the client devices 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and IP in the TCP/IP Internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.
Some or all of the client devices 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the client devices 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wired or wireless networks over different communication links. For example, the WTRU 102c shown in
The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the client device 102 to operate in a wired or wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While
The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117 or communication link 119. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. In yet another embodiment, the transmit/receive element may be a wired communication port, such as an Ethernet port. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wired or wireless signals.
In addition, although the transmit/receive element 122 is depicted in
The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the client device 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the client device 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.
The processor 118 of the client device 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the client device 102, such as on a server or a home computer (not shown).
The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the client device 102. The power source 134 may be any suitable device for powering the WTRU 102. As examples, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, a wall outlet and the like.
The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the client device 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the client device 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment. In accordance with an embodiment, the client device 102 does not comprise a GPS chipset and does not acquire location information.
The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
The zoom coding encoder 208 receives the source video stream either in uncompressed or a previously compressed format, encodes or transcodes the source video stream into a plurality of zoom coded streams 210, wherein each of the zoom coded streams represents a portion (e.g. a slice, a segment, or a quadrant) of the overall source video. The zoom coded streams may be encoded at a higher resolution than traditional reduced resolution ABR streams. In some embodiments, the zoom coded streams are encoded at the full capture resolution. Consider an embodiment in which the source video stream has a resolution of 4K. The corresponding ABR representations may be at HD and lower resolutions. A corresponding zoom-coded stream may also be at HD resolution, but this may correspond to the capture resolution for the zoomed section. Here, the zoom coded streams are represented by stream 210-A of a first object at a first representation, stream 210-B of the first object at a second representation, and any other number of objects and representations are depicted by stream 210-N.
In embodiments using transcoding that convert a video stream from one compressed format to another, a decoding process is performed that brings the video back to the uncompressed domain at its full resolution followed by the re-encoding process of creating new compressed video streams which may, for example, represent different resolutions, bit rates or frame rates. The zoom coded streams 210 may be encoded at the original resolution of the source video and/or at one or more lower resolutions. In some embodiments, the resolutions of the zoom coded streams are higher than the resolutions of the un-zoomed ABR streams. The zoom coded streams are transmitted to or placed onto the streaming server for further transmission to the client devices. In some embodiments, the ABR encoder 204 and the zoom coding encoder 208 are the same encoder, configured to encode the source video into the ABR streams and the zoom coded streams.
In accordance with an embodiment, the adaptive bitrate encoder 204 or transcoder receives an uncompressed or compressed input video stream and encodes or transcodes the video stream into a plurality of representations 206. The plurality of representations may vary the resolution, frame rate, bit rate, and/or the like and are represented by the streams 206-A, 206-B, and 206-N. The encoded video streams according to the plurality of representations may be transferred to the streaming server 216. The streaming server 216 transmits encoded video streams via the network (212 and/or 214) to the client devices 218A-C. The transmission may take place over any of the available communication interfaces, such as the communication link 115/116/117 or 119.
In general, for a given video sequence, it is possible to create any number of zoom coded streams, with at least some of the zoom coded streams being associated with one or more tracked objects. A tracked object may be, e.g., a ball, a player, a person, a car, a building, a soccer goal, or any object which may be tracked and for which a zoom coded stream may be available.
Various techniques for object tracking are described in, for example, A. Yilmaz, O. Javed, M. Shah, “Object Tracking—A Survey”, ACM Computing Surveys, Vol. 38, No. 4, Article 13, December 2006. Based on the type of content, an encoder may choose from the available techniques to track moving objects of interest and hence may generate one or more object-centric regions of interest.
An example scenario is the following. The encoder creates two additional zoom coded streams in addition to the original stream. The availability of the encoded streams is communicated to the client by the streaming server in the form of an out-of-band “manifest” file. This is done periodically depending on how often the encoder changes objects of interest to be tracked. The stream information may be efficiently communicated in the client in the form of (x, y) coordinates and information regarding the size of a window for each zoom coded stream option. This stream information may be sent in the manifest information as supplemental data. A legacy client would ignore this stream information since it is unable to interpret this supplemental data field. However, a client capable of processing zoom coded streams is able to interpret the stream information and stores it for rendering (e.g. if an end user requests to use a zoom coding feature). In some embodiments, the end user in the normal course of watching a program may request to see if there are any zoom coded streams available. In some embodiments, this could be done in the form of a simple IR command from a remote control (e.g. a special one-touch button that sends a request back to the set-top box (STB) or other client device to highlight on a still image the other zoom coded objects that are being tracked and could hence be requested for viewing). In embodiments including a two-way interactive device (such as a tablet or phone), the interface can be even richer. For example, a user may tap the touch screen of a two-way interactive device to bring up an interface which may identify the available zoom-coded objects, and selection and/or interaction with the zoom-coded objects may be realized via the touch screen interface of the device. The requests may be implemented with a button on the client device (or remote control thereof) that, when pressed, leads to interpretation and/or display of the manifest information and shows to the user what zoom coded objects may be viewed.
In some embodiments, a rendering reference point or “render point” may be provided for a tracked object to indicate a rendering position associated with one or more positions of the tracked object (or region) of interest. The rendering reference point may, for example, indicate a position (e.g. a corner or an origin point) of a renderable region which contains the object of interest at some point in time. The rendering reference point may indicate a size or extent of the renderable region. The rendering reference point may define a bounding box which defines the location and extent of the object/area of interest or of the renderable region containing the object/area of interest. The client may use the rendering reference point information to extract the renderable region from one or multiple zoom-coded streams or segments, and may render the region as a zoomed region of interest on the client display. The rendering reference points may be communicated to the client device. For example, rendering reference points may be transmitted in-band as part of the video streams or video segments, or as side information sent along with the video streams or video segments. Alternately the rendering reference points may be specified in an out-of-band communication (e.g. as metadata in a file such as a DASH MPD). The rendering reference point as communicated to the client may be updated on a frame-by-frame basis, which may allow the client to continuously vary the location of the extracted renderable region, and so the object of interest may be smoothly tracked on the client display. Alternatively, the rendering reference point may be updated more coarsely in time, in which case the client may interpolate the rendering position between updates in order to smoothly track the object of interest when displaying the renderable region on the client display. The rendering reference point comprises two parameters, a vertical distance and a horizontal distance represented by: (x, y). The rendering reference points may, for example, be communicated as supplemental enhancement information (SEI) messages to the client device.
In
There are many alternative user interface presentations of the metadata information from the server. In
The view 400 includes the same video content image as
In further embodiments, a representation of all available zoom coded segments of the object(s) of interest may be shown. A single timeline row with color-coded or pattern-coded regions may be used, as depicted by
At the client device, the metadata described above may be interpreted and presented in a variety of ways. For example, in one embodiment, the aggregate information may be presented at the bottom of the screen with the timeline and the objects/characters/ROIs displayed in icons on the side panel as illustrated in
In exemplary embodiments, the end user is visually cued (e.g. with an icon or color selection with bands beneath the timeline axis) for the availability of zoom coded streams within the time window of observation. The end user may then fast forward, rewind, or seek to the vicinity of the zoom coded stream or stream of interest (e.g. using an IR remote control, a touch screen interface, or other suitable input device). For example, the user may use such an input device to select or touch a portion of an object-annotated timeline in order to select an object of interest at an available time, and in response the client device may request, retrieve, decode and display segments of the associated zoomcoded stream beginning at the selected time. This may be done using a single timeline view as depicted in
In a further embodiment, the client device (based on end-user selection of one or more tracked objects of interest to the user) concatenates together only the scenes or regions (e.g. the timeline intervals) which contain the tracked objects of interest to the user. The client device may then present to the end user a collage of the action with automated editing which stitches together the zoom coded streams of the object, player or scene of interest, for example. In some embodiments, based on past viewing experiences of users, the client device is cued to automatically select certain objects/characters/ROIs based on the incoming data. For example, if a user has in the past tended to select a particular soccer player when watching video content of a particular soccer team, the client device may identify that the same soccer player is available as a zoom coded stream in the current video presentation, and so the client may automatically select the same soccer player in order to present the zoom coded stream content of that soccer player to the user. Other well-known attributes, such as a player's jersey number in a game or their name, may be pre-selected by a user in a user profile or at the start of a game or during the watching session. With this information, it will be possible to create a personalized collage of scenes involving that player specifically for the end user.
MPEG-DASH (ISO/IEC 23009-1:2014) is a new ISO standard that defines an adaptive streaming protocol for media delivery over IP networks. DASH is expected to become widely used (in replacement of current proprietary schemes such as Apple HLS, Adobe Flash, and Microsoft Silverlight. The following embodiments outline the delivery of zoom coding using MPEG DASH.
In some embodiments, the client device in a zoom coding system follows the following process:
In an exemplary embodiment, Slice User Data for object render points includes the following information:
Object_ID: Range 0-255. This syntax element provides a unique identifier for each object.
Object_x_position[n]: For each object ID n, the x position of the object bounding box.
Object_y_position[n]: For each object ID n, the y position of the object bounding box.
Object_x_size_in_slice[n]: For each object ID n, the x_dimension of the object bounding box.
Object_y_size_in_slice[n]: For each object ID n, the y_dimension of the object bounding box.
The object bounding box represents a rectangular region that encloses the object. In an exemplary embodiment, the (x, y) position is the upper left corner position of the object bounding box. Some objects may be split across more than 1 slice during certain frames. In this case, the object position and size may pertain to the portion of the object contained in the slice that contains the user data.
The position and size data described above may be slice-centric and may not describe the position and size of the entire object. The object bounding box may be the union of all the slice-centric rectangular bounding boxes for a given object.
In some embodiments, it is possible that the overall object bounding box is not rectangular. However, for purposes of display on a standard rectangular screen, these unions of the object bounding boxes are illustrated herein as being rectangular.
Using the Object_ID, and the position and size information in the client device, regions may be rendered on screen. This information may be updated (e.g. periodically or constantly updated) through the SEI messages. As shown in
In an exemplary embodiment, when a user makes a selection on the client device (e.g. by pressing a button) to get information on the zoom coded streams that may be downloaded/tracked, the client device responds by displaying the bounding boxes on a static image. In some embodiments, the static image is a frame of video that was stored on the server. The static image may be a single image decoded by the client from a video segment received from the server. The static image may be, for example, the frame most recently decoded by the client, a recently received IDR frame, or a frame selected by the client to contain all of the available tracked objects or a maximum number of the available tracked objects. Other alternatives include the use of manually annotated sequences using templates of specific characters. For example, the static image may be the image of a player who is being tracked in the sequence. The user could, for example, recognize the player and request all zoom coded streams of that character that are available.
The user provides input through, for example, a mouse or a simple numbering or color coded mechanism to select one or more of the zoom coded objects. Based on the user input, the server starts to stream the appropriate zoom coded stream to the user's client device. In
When the end user 708 makes a program request (at 710), the client device sends a request message to the web server 704 (at 712) and the web server 704 redirects (at 712-716) the request to the appropriate streaming server 702. The streaming server 702 sends down the appropriate manifest (at 718) of media presentation descriptor file (including the zoom coded stream options) to the user's client device 706. The normal program is then decoded and displayed (at 720). (The normal program may correspond to one or more of the traditional ABR streams as depicted in
Client Exchange with Network Based on User Input.
In at least one embodiment, the client device requesting the zoom coded information performs the following steps:
In some embodiments, multiple views of the zoom coded information may be provided in full, original resolution, in a picture-in-picture type of display. In some embodiments, the various zoom coded views are presented in a tiled format.
Some embodiments enable smooth switching between the overall unzoomed view and zoom coded view with a one touch mechanism (either at remote or keyboard or tablet)
In some embodiments, the client device allows automatic switching to a zoom coded view (even without the user being cued). Such an embodiment may be appealing to users who merely want to track that users' objects of interest. In such an embodiment, users are able to track an object of interest without going through the highlighting mechanism. As an example, a user could set a preference in their client device that they would like to see a zoom coded view of their favorite player whenever the player is in the camera's field of view. Some such embodiments incorporate a training mode for users to specify such preferences ahead of the presentation.
At 902, the first representation of the video stream and the object-of-interest identifier is received from a content server. The object-of-interest identifier indicates an availability of a second representation of a portion of the video stream that depicts the object of interest. At 904, both the first representation of the video stream and the object-of-interest identifier are caused to be displayed at a client device. At 906, in response to a user selection of the second representation of the portion of the video stream (e.g. selection of the displayed object-of-interest identifier by the user), a request for the second representation of the portion of the video stream is transmitted to the content server. At 910, the second representation of the portion of the video stream is received, and at 912, the second representation of the portion of the video stream is displayed.
Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element may be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
The present application claims priority to U.S. Provisional Application Ser. No. 62/236,023 filed Oct. 1, 2015, titled “METHOD AND SYSTEMS FOR CLIENT INTERPRETATION AND PRESENTATION OF ZOOM-CODED CONTENT.”
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US16/53512 | 9/23/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62236023 | Oct 2015 | US |