The disclosed embodiments relate generally to providing and displaying video, and more particularly, to methods and systems for providing and displaying video at multiple distinct video resolution or quality levels.
Many modern devices for displaying video, such as high-definition televisions, computer monitors, and cellular telephone display screens, allow users to manipulate the displayed video by zooming. In traditional systems for zooming video, the displayed resolution of the video does not increase as the zoom factor increases, causing the zoomed video to appear blurry and resulting in an unpleasant viewing experience. Furthermore, users also may desire to zoom in on only a portion of the displayed video and to view the remainder of the displayed video at a lower resolution.
In addition, bandwidth limitations may constrain the ability to provide high resolution and high quality video. A user frustrated by low-quality video may desire to view at least a portion of the video at higher quality.
In some embodiments a method is performed to provide video from a video data source. The video data source includes a sequence of multi-level frames. Each multi-level frame comprises a plurality of copies of a respective frame. In one aspect, each copy has an associated video resolution level that is a member of a predefined range of video resolution levels that range from a highest video resolution level to a lowest video resolution level. In another aspect, each copy has an associated video quality level that is a member of a predefined range of video quality levels that range from a highest video quality level to a lowest video quality level. In the method, first video data corresponding to a first portion of a first copy of a respective frame is extracted from the video data source. In addition, second video data corresponding to a second portion of a second copy of the respective frame is extracted from the video data source. The video resolution level or video quality level of the second copy is distinct from the video resolution level or video quality level of the first copy. The first and second video data are transmitted to a client device for display. The extracting and transmitting are repeated with respect to a plurality of successive multi-level frames of the video data source.
In some embodiments a system provides video from a video data source. The video data source includes a sequence of multi-level frames. Each multi-level frame includes a plurality of copies of a respective frame. In one aspect, each copy has an associated video resolution level that is a member of a predefined range of video resolution levels that range from a highest video resolution level to a lowest video resolution level. In another aspect, each copy has an associated video quality level that is a member of a predefined range of video quality levels that range from a highest video quality level to a lowest video quality level. The system includes memory, one or more processors, and one or more programs stored in the memory and configured for execution by the one or more processors. The one or more programs include instructions to extract, from the video data source, first video data corresponding to a first portion of a first copy of a respective frame and instructions to extract, from the video data source, second video data corresponding to a second portion of a second copy of the respective frame. The video resolution level or video quality level of the second copy is distinct from the video resolution level or video quality level of the first copy. The one or more programs further include instructions to transmit the first and second video data to a client device for display and instructions to repeat the extracting and transmitting with respect to a plurality of successive multi-level frames of the video data source.
In some embodiments a computer readable storage medium stores one or more programs for use in providing video from a video data source. The video data source includes a sequence of multi-level frames. Each multi-level frame includes a plurality of copies of a respective frame. In one aspect, each copy has an associated video resolution level that is a member of a predefined range of video resolution levels that range from a highest video resolution level to a lowest video resolution level. In another aspect, each copy has an associated video quality level that is a member of a predefined range of video quality levels that range from a highest video quality level to a lowest video quality level. The one or more programs are configured to be executed by a computer system and include instructions to extract, from the video data source, first video data corresponding to a first portion of a first copy of a respective frame and instructions to extract, from the video data source, second video data corresponding to a second portion of a second copy of the respective frame. The video resolution level or video quality level of the second copy is distinct from the video resolution level or video quality level of the first copy. The one or more programs also include instructions to transmit the first and second video data to a client device for display and instructions to repeat the extracting and transmitting with respect to a plurality of successive multi-level frames of the video data source.
In some embodiments a system provides video from a video data source. The video data source includes a sequence of multi-level frames. Each multi-level frame includes a plurality of copies of a respective frame. In one aspect, each copy has an associated video resolution level that is a member of a predefined range of video resolution levels that range from a highest video resolution level to a lowest video resolution level. In another aspect, each copy has an associated video quality level that is a member of a predefined range of video quality levels that range from a highest video quality level to a lowest video quality level. The system includes means for extracting, from the video data source, first video data corresponding to a first portion of a first copy of a respective frame and means for extracting, from the video data source, second video data corresponding to a second portion of a second copy of the respective frame. The video resolution level or video quality level of the second copy is distinct from the video resolution level or video quality level of the first copy. The system also includes means for transmitting the first and second video data to a client device for display. The means for extracting and means for repeating are configured to repeat the extracting and transmitting with respect to a plurality of successive multi-level frames of the video data source.
In some embodiments a method of displaying video at a client device separate from a server includes transmitting to the server a request specifying a window region to display over a background region in a video. First and second video data are received from the server. The first video data corresponds to a first portion of a first copy of a first frame in a sequence of frames. The second video data corresponds to a second portion of a second copy of the first frame. In one aspect the first copy and the second copy have distinct video resolution levels; in another aspect the first copy and the second copy have distinct video quality levels. The first and second video data are decoded. The decoded first video data are displayed in the background region and the decoded second video data are displayed in the window region. The receiving, decoding, and displaying are repeated with respect to a plurality of successive frames in the sequence.
In some embodiments a client device separate from a server displays video. The client device includes memory, one or more processors, and one or more programs stored in the memory and configured for execution by the one or more processors. The one or more programs include instructions to transmit to the server a request specifying a window region to display over a background region in a video and instructions to receive first and second video data from the server. The first video data corresponds to a first portion of a first copy of a first frame in a sequence of frames and the second video data corresponds to a second portion of a second copy of the first frame, wherein the first copy and the second copy have distinct video resolution levels or video quality levels. The one or more programs also include instructions to decode the first and second video data; instructions to display the decoded first video data in the background region and the decoded second video data in the window region; and instructions to repeat the receiving, decoding, and displaying with respect to a plurality of successive frames in the sequence.
In some embodiments a computer readable storage medium stores one or more programs for use in displaying video at a client device separate from a server. The one or more programs are configured to be executed by a computer system and include instructions to transmit to the server a request specifying a window region to display over a background region in a video and instructions to receive first and second video data from the server. The first video data corresponds to a first portion of a first copy of a first frame in a sequence of frames and the second video data corresponds to a second portion of a second copy of the first frame. The first copy and the second copy have distinct video resolution levels or video quality levels. The one or more programs also include instructions to decode the first and second video data; instructions to display the decoded first video data in the background region and the decoded second video data in the window region; and instructions to repeat the receiving, decoding, and displaying with respect to a plurality of successive frames in the sequence.
In some embodiments a client device separate from a server is used for displaying video. The client device includes means for transmitting to the server a request specifying a window region to display over a background region in a video and means for receiving first and second video data from the server. The first video data corresponds to a first portion of a first copy of a first frame in a sequence of frames and the second video data corresponds to a second portion of a second copy of the first frame. The first copy and the second copy have distinct video resolution levels or video quality levels. The client device also includes means for decoding the first and second video data and means for displaying the decoded first video data in the background region and the decoded second video data in the window region. The means for receiving, decoding, and displaying are configured to repeat the receiving, decoding, and displaying with respect to a plurality of successive frames in the sequence.
Like reference numerals refer to corresponding parts throughout the drawings.
Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
The server system 104 includes a server 108, a video database or file system 110 and a video encoder/re-encoder 112. Server 108 serves as a front-end for the server system 104. Server 108, sometimes called a front end server, retrieves video from the video database or file system 110, and also provides an interface between the server system 104 and the client devices 102. In some embodiments, server 108 includes a bitstream repacker 117 and a video enhancer 115. In some embodiments, the bitstream repacker 117 repacks at least a portion of one or more bitstreams comprising video data with multiple levels of resolution or multiple quality levels to a standard bitstream. In some embodiments, the video enhancer 115 eliminates artifacts associated with encoding and otherwise improves video quality. The bitstream repacker 117 and video enhancer 115 may each be implemented in hardware or in software.
In some embodiments, the video encoder/re-encoder 112 re-encodes video data received from the video database or file system 110. In some embodiments, the video data provided to the encoder/re-encoder 112 is stored in the video database or file system 110 in one or more standard video formats, such as motion JPEG (M-JPEG), MPEG-2, MPEG-4, H.263, H.264/Advanced Video Coding (AVC), or any other official or defacto standard video format. The re-encoded video data produced by the encoder/re-encoder 112 may be stored in the video database or file system 110 as well. In some embodiments, the re-encoded video data include a sequence of multi-level frames; in some embodiments the multi-level frames are partitioned into tiles. In some embodiments, a respective multi-level frame in the sequence includes a plurality of copies of a frame, each having a distinct video resolution level. Generation of multi-level frames that have multiple distinct video resolution levels and partitioning of multi-level frames into tiles is described in the “Encoding Video at Multiple Resolution Levels” application (see Related Applications, above). In some embodiments, respective multi-level frames in the sequence comprise a plurality of copies of a frame, wherein each copy has the same video resolution level but a distinct video quality level, such as distinct level of quantization or truncation of the corresponding video bitstream.
In some embodiments, the video encoder/re-encoder 112 encodes video data received from a video camera such as a camcorder (not shown). In some embodiments, the video data received from the video camera is raw video data, such as pixel data. In some embodiments, the video encoder/re-encoder 112 is separate from the server system 104 and transmits encoded or re-encoded video data to the server system 104 via a network connection (not shown) for storage in the video database or file system 110.
In some embodiments, the functions of server 108 may be divided or allocated among two or more servers. In some embodiments, the server system 104, including the server 108, the video database or file system 110, and the video encoder/re-encoder 112 may be implemented as a distributed system of multiple computers and/or video processors. However, for convenience of explanation, the server system 104 is described below as being implemented on a single computer, which can be considered a single logical system.
A user interfaces with the server system 104 and views video at a client system or device 102 (called the client device herein for ease of reference). The client device 102 includes a computer 114 or computer-controlled device, such as a set-top box (STB), cellular telephone, smart phone, person digital assistant (PDA), or the like. The computer 114 typically includes one or more processors (not shown); memory, which may include volatile memory (not shown) and non-volatile memory such as a hard disk drive (not shown); one or more video decoders 118; and a display 116. The video decoders 118 may be implemented in hardware or in software. In some embodiments, the computer-controlled device 114 and display 116 are separate devices (e.g., a set-top box or computer connected to a separate monitor or television or the like), while in other embodiments they are integrated into a single device. For example, the computer-controlled device 114 may be a portable electronic device that includes a display screen, such as a cellular telephone, personal digital assistant (PDA), or portable music and video player. In another example, the computer-controlled device 114 is integrated into a television. The computer-controlled device 114 includes one or more input devices or interfaces 120. Examples of input devices 120 include a keypad, touchpad, touch screen, remote control, keyboard, or mouse. In some embodiments, a user may interact with the client device 102 via an input device or interface 120 to display a first region of video at a first video resolution level or quality level and a second region of video at a second video resolution level or quality level on the display 116.
Each of the above identified elements 216-224 in
Each of the above identified elements in
Although
A video is displayed on a display screen 522 of a device 520 at a resolution corresponding to the video resolution level of the frame 502. In response to a user request to magnify a region within the displayed video, a portion 504 of the frame 500 is identified. The frame 500 itself is selected based on its video resolution level; examples of criteria for selecting a video resolution level are described below with regard to the process 600 (
In some embodiments, the frames 500 and 502 are stored at a server system (e.g., in the video database 318 of the server system 300). The server system extracts bitstreams from the frames 500, 502 and transmits the extracted bitstreams to the client device 520, which decodes the received bitstreams. In some embodiments, the client device 520 includes multiple decoders: a first decoder decodes the bitstream corresponding to the portion 504 of the frame 500 and a second decoder decodes the bitstream corresponding to the frame 502. Alternatively, in some embodiments a single multi-level decoder decodes both bitstreams.
In some embodiments, a bitstream repacker 512 receives the bitstreams extracted from the frames 500 and 502 and repackages the extracted bitstreams into a single bitstream for transmission to the client device 520, as illustrated in
In some embodiments, the frames 500 and 502 are stored in a memory in or coupled to the device 520, and the device 520 performs the extraction as well as the decoding and display operations.
A video is displayed on a display screen 522 of a device 520 at a quality corresponding to the video quality level of the frame 532. In response to a user request to view a region within the displayed video at an increased quality level, a portion 534 of the frame 530 is identified. The frame 530 itself is selected based on its video quality level; examples of criteria for selecting a video quality level are described below with regard to the process 600 (
In some embodiments, the frames 530 and 532 are stored at a server system that extracts the bitstreams and transmits the extracted bitstreams to the client device 520, as described above with regard to
A scale factor and a video resolution or quality level is identified (604) for the window region. In some embodiments, the scale factor specifies the degree to which video to be displayed in the window region is zoomed in or out with respect to the video displayed in the background region. In some embodiments, the video resolution level or video quality level is the highest resolution or quality level at which video may be displayed in the window region. In some embodiments, the video resolution level or video quality level is determined by applying the scale factor to the video resolution level or video quality level of the background region. In some embodiments, the video resolution level or video quality level is the highest resolution or quality level that may be accommodated by available bandwidth (e.g., transmission bandwidth from a server to a client device, or processing bandwidth at a display device).
For successive frames in a sequence of frames at the identified video resolution or quality levels, a portion of the frame corresponding to the background region is identified (606) and the frame is cropped accordingly. In some embodiments, cropping the frame includes selecting the tiles and/or macro-blocks that at least partially cover the background region. In some embodiments, the background region is constrained to have borders that coincide with the borders of tiles or macro-blocks, and cropping the frame includes selecting the tiles and/or macro-blocks that correspond to the background region.
If the scale factor is not equal to zero (608-No), an inverse scale factor is applied (610) to scale the cropped frame. For example, if the scale factor is 2×, such that both horizontal and vertical dimensions within the window region are to be expanded by a factor of two with respect to horizontal and vertical dimensions within the background region, then an inverse scale factor of 0.5 is applied to the cropped frame to define an area having a width and height equal to half the width and height, respectively, of the cropped frame. If the scale factor is equal to zero (608-Yes), operation 610 is omitted.
An offset is applied (612) to identify a portion of the frame corresponding to the window region. In some embodiments, the offset specifies a location within the frame of the portion of the frame corresponding to the window region, where the size of the portion corresponding to the window region is defined by the inverse scale factor.
For successive frames, each frame is cropped (614) according to the boundaries of the portion corresponding to the window region as identified in operation 612. In some embodiments, cropping the frame includes selecting the tiles and/or macro-blocks that at least partially cover the portion corresponding to the window region. In some embodiments, the portion corresponding to the window region is constrained to have borders that coincide with the borders of tiles or macro-blocks, and cropping the frame includes selecting the tiles and/or macro-blocks that correspond to the portion corresponding to the window region. The bitstream of the cropped frame then may be extracted and provided for decoding by the display device.
In some embodiments, a method analogous to the method 600 is used to determine a portion of a frame for display in a background region of a display screen, wherein the background region is scaled with respect to a previously displayed background region.
If the frame is an I-frame (804-Yes), tiles and macro-blocks in the current frame are identified (808) that at least partially cover the identified portion of the frame. If the frame is not an I-frame (804-No) (e.g., the frame uses predictive encoding), tiles and macro-blocks in the current frame and the relevant reference frame or frames are identified (806) that at least partially cover the identified portion of the frame.
The bitstreams for the identified tiles and/or MBs are extracted (810). The extracted bitstreams are provided to a decoder, which decodes the bitstreams for display in a corresponding region on a display screen.
In some embodiments, macro-blocks may be dual-encoded with and without predictive encoding. For example, if predictive encoding of a respective macro-block requires data outside of the macro-block's tile, then two versions of the macro-block are encoded: one using predictive encoding (i.e., “inter-MB coding”) and one not using predictive encoding (i.e., “intra-MB coding”). In some embodiments of the method 800, if a macro-block identified in operation 806 requires reference frame data from outside of the tiles identified in operation 806 as at least partially covering the portion, then the intra-MB-coded version of the macro-block is extracted. If the macro-block does not require reference frame data from outside of the identified tiles, then the inter-MB-coded version of the macro-block is extracted.
In some embodiments, a region on a display screen may be translated in response to user input.
User input 902 (
In response to the user input 902, the location of the portion 504 to be displayed in the window region 524 is shifted in a subsequent frame 500-2 (
In some embodiments, the window region 524 is automatically translated, as illustrated in
In some embodiments, the location of the portion 504 in a frame 502 specifies a portion of the frame 502 to be excluded when extracting a bitstream to be decoded and displayed in the background region 526. For example, tiles or bitstreams that fall entirely within the portion 504 of a frame 502 are not extracted. In some embodiments in which the display location of the window region 524 on the screen 522 is translated in response to the user input 902, the location of the portion 504 is shifted in the frame 502-2 with respect to the frame 502-1, as illustrated in
In some embodiments, a window region having a different (e.g., higher) video quality level than a background region may be translated, by analogy to
In the method 950, a tracking window 924 is identified (952) within a window region 922 in the frame 920-1. In some embodiments, the tracking window 924 is offset (954) from a first edge of the window region 922 by a first number of pixels 926 and from a second edge of the window region 922 by a second number of pixels 928. In some embodiments, the offsets 926 and 928 are chosen substantially to center the tracking window 924 within the window region 922. In some embodiments the offsets 926 and 928 are adjustable to allow the location of the tracking window 926 to correspond to the location of a potential object of interest identified within the window region 922.
For each macro-block MBi in the tracking window 924, a normalized motion vector mvi is computed (956) by averaging motion vectors for all sub-blocks of MBi, where i is an integer that indexes respective macro-blocks In some embodiments, each motion vector is weighted equally (958) when averaging the motion vectors (e.g., for MPEG-2 and baseline MPEG-4). Alternatively, in some embodiments a weighted average of the motion vectors for all sub-blocks of MBi is calculated. For example, each motion vector is weighted by the area of its sub-block (960) (e.g., for H.264). In yet another example, the motion vectors of any non-moving sub-blocks is either excluded or given reduced weight (e.g., by a predefined multiplicative factor, such as 0.5) when computing the normalized motion vector for a respective macro-block.
An average motion vector mvavg is computed (962) by averaging the mvi over all MBi in the tracking window 924. The standard deviation (σ) is computed of the mvi over all MBi in the tracking window. The average motion vector is then recalculated (966), ignoring (i.e., excluding from the calculation) all motion vectors mvi for which ∥mvi-mvavg∥>cσ. In some embodiments, c is an adjustable parameter. In some embodiments, c equals 1, or 3, or is in a range between 0.5 and 10. Alternately, or from a conceptual point of view, the recomputed average motion vector is an average of motion vectors mvi that excludes (from the computed average) non-moving macro-blocks and macro-blocks whose movement magnitude and/or direction is significantly divergent from the dominant movement (if any) within the tracking window.
The location of the window region is translated (968) in a subsequent frame by a distance specified by the recalculated average motion vector of operation 966. For example, the location of window region 922 in the frame 920-2 has been translated with respect to its location in the frame 920-1 by a horizontal distance 930 and a vertical distance 932, where the distances 930 and 932 are specified by the recalculated average motion vector of operation 966.
While the method 950 includes a number of operations that appear to occur in a specific order, it should be apparent that the method 950 can include more or fewer operations, which can be executed serially or in parallel (e.g., using parallel processors or a multi-threading environment), an order of two or more operations may be changed and/or two or more operations may be combined into a single operation. For example, operation 952 may be omitted and the remaining operations may be performed for the entire window region 922 instead of for the tracking window 924. However, use of a tracking window 924 saves computational cost and avoids unnecessary latency associated with the method 950.
In some embodiments, a request is received (1004) from a client device (e.g., 520,
First video data are extracted (1006) from the video data source. The first video data corresponds to a first portion of a first copy of a respective frame. Examples of a first portion of the first copy include the portion of frame 502 (
In some embodiments the first portion is determined (1008) based on the background region specified in the request. In some embodiments, determining the first portion includes applying an inverse scale factor (e.g., the inverse of the scale factor specified for the background region in the request) and determining an offset within the frame when extracting the first video data from the first copy of the respective frame.
Second video data are extracted (1010) from the video data source. The second video data corresponds to a second portion of a second copy of a respective frame (e.g., portions 504 or 534 of frames 500 or 530,
In some embodiments the second portion is determined (1012) based on the window region specified in the request. In some embodiments, determining the second portion includes applying an inverse scale factor (e.g., the inverse of the scale factor specified for the window region in the request) and determining an offset within the frame when extracting the second video data from the second copy of the respective frame, as described for the method 600 (
In some embodiments, extracting the first and second video data includes identifying a first set of tiles covering the first portion of the first copy and a second set of tiles covering the second portion of the second copy. In some embodiments, a respective tile includes a plurality of macro-blocks, including a first macro-block that is dual-encoded as both an intra-coded bitstream, without predictive coding, and an inter-coded bitstream, with predictive coding. Extracting the first (or second) video data includes extracting the intra-coded bitstream when the first macro-block requires data from outside of the first (or second) portion and extracting the inter-coded bitstream when the first macro-block does not require data from outside the first (or second) portion.
The first and second video data are transmitted (1016) to the client device for display.
In some embodiments, the first and second video data are repacked (1014) into a single video bitstream, which is transmitted (1018) to the client device for display. Repacking is illustrated in
The extracting and transmitting are repeated (1020) with respect to a plurality of successive multi-level frames of the video data source.
In some embodiments, the second portion and/or the first portion are translated (1022) for the successive respective multi-level frames. In some embodiments the second portion and/or the first portion are translated in response to a request received from the client device (e.g., as illustrated in
The method 1000 thus provides an efficient method of providing video data for display at separate video resolutions or quality levels in window and background regions. For example, by enabling the provided high resolution or high quality video data to correspond to a particular display region, the method 1000 efficiently uses available transmission bandwidth.
First and second video data are received (1104) from the server. The first video data correspond to a first portion of a first copy of a first frame in a sequence of frames. The second video data correspond to a second portion of a second copy of the first frame. The first copy and the second copy have distinct video resolution levels or video quality levels. Examples of a first portion of the first copy include the portion of frame 502 or 532 that excludes the portion 504 or 534 (
In some embodiments, the first and second video data are received (1106) in a single video bitstream, as illustrated in
In some embodiments, the first and second video data are received (1108) from a single video source at the server (e.g., from a single MLVF 402,
The first and second video data are decoded (1112). In some embodiments, a single decoder decodes (1114) the first and second video data. In some embodiments, a first decoder decodes (1116) the first video data and a second decoder decodes the second video data.
In some embodiments, the first video data and/or the second video data include data extracted from an inter-coded bitstream of a first macro-block in the first frame and an intra-coded bitstream of a second macro-block in the first frame. In some embodiments, the first and second video data comprise a plurality of tiles in the first frame, wherein at least one of the tiles comprises a plurality of intra-coded macro-blocks and at least one of the tiles comprises a plurality of inter-coded macro-blocks.
The decoded first video data are displayed (1118) in the background region and the decoded second video data are displayed in the window region.
The receiving, decoding, and displaying are repeated (1120) with respect to a plurality of successive frames in the sequence.
In some embodiments, a request to pan the window region is transmitted (1130,
In some embodiments, a request to pan the background region is transmitted (1140,
The method 1100 thus provides a bandwidth-efficient method for displaying video at separate video resolutions or quality levels in window and background regions, by enabling the higher resolution or higher quality video data to correspond to a particular display region.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.
This application claims priority to U.S. Provisional Patent Application No. 61/075,305, titled “Providing and Displaying Video at Multiple Resolution and Quality Levels,” filed Jun. 24, 2008, which is hereby incorporated by reference in its entirety. This application is related to U.S. patent application Ser. No. 11/639,780, titled “Encoding Video at Multiple Resolution Levels,” filed Dec. 15, 2006, and to U.S. patent application Ser. No. 12/145,453, titled “Displaying Video at Multiple Resolution Levels,” filed Jun. 24, 2008, both of which are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
61075305 | Jun 2008 | US |