The present invention relates to adaptively streaming media encoded at variable bit rates and/or optimizing the presentation thereof according perceivable capabilities of a client or other device interfacing the media with a user, such as but not necessarily limited to optimizing Dynamic Adaptive Streaming over HTTP (DASH).
Dynamic Adaptive Streaming over HTTP (DASH), such as that described in Part 1: Media presentation description and segment formats (ISO/IEC 23009-1, Second edition, 2014 May 15), the disclosure of which is hereby incorporated by reference in its entirety herein, relates to employing Hypertext Transfer Protocol (HTTP) to facilitate transferring media content from a server to a client. DASH specifies Extensible Markup Language (XML) and binary formats that enable delivery of media content from HTTP servers to HTTP clients and enable caching of content by HTTP caches, such as in accordance with messaging and other processes described in Internet engineering task force (IETF) request for comment (RFC) 2616, the disclosure of which is hereby incorporated by reference in its entirety herein. DASH, as noted in the above identified specification, is intended to support a media-streaming model for delivery of media content whereby clients may request data using the HTTP protocol from web servers, including those lacking DASH-specific capabilities. While the present invention is not necessary limited to DASH, DASH is representative of one distribution model having processes for selecting, encoding and transmitting media content lacking the optimization contemplated by the present invention.
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
The shared medium 20 may be any network sufficient to facilitate exchanging Internet protocol (IP) layer messaging or other suitable signaling between the media server 12 and the media clients 14, 16, 18, such as to facilitate the contemplated media streaming and/or additional services available from a media service provider 36. The shared medium 20 may be configured as an IP-based network having capabilities sufficient to facilitate wired and/or wireless IP-layer message exchange according to HTTP whereby bandwidth is commonly shared between each media client 14, 16, 18 resulting in bandwidth consumed by one media client 14, 16, 18 diminishing bandwidth available to the other media clients 14, 16, 18. The available bandwidth or bit rate may vary statically or dynamically depending on network congestion, quality of service (QOS), subscription rights, entitlements or any number of other variables, including the bit rate varying for upstream and downstream communications. The bandwidth sharing or co-dependency of the media clients 14, 16, 18 may dynamically affect the network resources, network congestion levels and otherwise influence bandwidth or bit rates available to facilitate streaming media. The present invention fully contemplates facilitating media streaming using non-shared resources and describes the shared medium 20 merely for describing a scenario where media selection may be influenced depending on dynamically changing resources.
Block 62 relates to the media server 12 or other device associated therewith encoding the received video. The encoding may generally correspond with an encoder or other application compressing or processing the received video frames for transport, such as with the use of mechanisms and capabilities understood by one having ordinary skill in the art operating to facilitate the optimizations contemplated herein. The encoding may correspond with the encoding described in DASH whereby a particular video may be encoded to create a number of representations with the set of frames comprising each representation being variably compressed in order to maintain a constant bit rate throughout an entirety of the corresponding representation. The encoding may alternatively correspond with the similar encoding described in DASH whereby the set of frames comprising each representation being variably compressed in order to maintain a constant bit rate for the majority of the duration of the representation, but allowing the bit rate to decrease for a certain minority of the duration of the representation, which may be referred to as a constrained variable bit rate approach. In either alternative, each such representation may thereby have some frames encoded at differing or varying resolution or quality in order for an entirety of the corresponding representation to be streamed at essentially a constant bit rate from start to finish. This type of constant bit rate encoding generally corresponds with the representations having a greater average bit rate providing higher quality video than the representations encoded at a lower average bit rate. The greater bit rates enable more data (bits) to be used in representing the original video so as to enable the video to be reproduced following encoding at a greater resolution or with other quality characteristics better than the lower bit rate encodings.
The use of such constant bit rate encodings may be useful when available bandwidth or other network restrictions or capabilities are unchanging during the duration of the video playback, and are a predominant factor in deciding which one of the representations is desired for access as the essentially unwavering bit rate enables media clients to simply select the representation having the maximum supportable bit rate. The use of such constant bit rate encodings may also be beneficial when generating metadata or other information used to facilitate the selection thereof as a single bit rate attribute can be assigned for an entirety of each representation. DASH, for example, utilizes a media presentation descriptor (MPD) to provide information associated with available representations within the MPD where a single bit rate or bandwidth attribute is assigned to each available representation, i.e., the number of bit rate or bandwidth attributes equals the number of representations. One non-limiting aspect of the present invention contemplates optimizing video streaming by similarly encoding the video into multiple representations but with each or some representations having a constant quality and variable bit rate. The constant quality and variable bit rate may generally correspond with each frame or underlying portion of the media being encoded at bit rates necessary to maintain a desired spatial and/or temporal resolution and/or a desired distortion level throughout an entirety of the corresponding representation.
The constant quality encoding may result in the bit rates for a particular representation varying throughout the corresponding representation depending on the complexity of the corresponding frame or portion of video. While constant bit rate encodings may have some bit rate variations due to encoding tolerances or other inherent variables, those bit rate variations may be centered at a mean or average bit rate whereby the quality of the attendant portion of video is adjusted to maintain the constant bit rate. The constant quality encodings, in contrast, may be centered at a mean of average quality with the bit rate being unconstrained to any mean or average value whereby the bit rate of the attendant portion of video is adjusted as necessary to maintain the constant quality. The metric or measure of the constant quality encoding process may be based on spatial and/or temporal resolution or other quality metrics or levels such as the quantization parameter or quantizer coefficients. The maintenance of a constant quality may result in more complex video frames requiring a greater bit rate than less complex video frames as more bits may be required in order to represent the entirety of the underlying video at the same spatial and/or temporal resolution. The constant quality encoding process may be characterized with the bit rate continuously varying to maintain a constant quality whereas the constant bit rate encoding process may be characterized with the quality continuously varying to maintain a constant bit rate.
Block 64 relates to generating metadata sufficient to facilitate representing the encoding performed for any number of videos, particularly when undertaken according to the described constant bit rate and/or constant quality processes. The metadata may match or partially corresponding with the DASH MPD described above or virtually any file, document or other suitable construct having data or other syntax suitable for conveying information to the media clients 14, 16, 18 necessary for parsing and accessing media encodings made available for transport from the media server 12. One non-limiting aspect of the present invention contemplates use of the DASH MPD when representing video encoded according to the constant bit rate process and deviating from the DASH MPD when representing video encoded according to the constant quality process. The constant quality MPD or other metadata construct for the constant quality encodings may deviate insofar as including additional attributes, values, etc. sufficient to represent characteristics associated with the corresponding constant quality encoding process. Additional or different metadata may be generated to specify quality metrics for each representation, such as but not necessary limited to the attendant spatial and/or temporal resolution and/or a subjective quality index, and/or to specify bit rate variations for each representation, such as by including a number of attributes sufficient to at least indicate each significant bit rate variation (e.g., each bit rate change above a selectable threshold).
The first, second and third charts 70, 72, 74 or charts similarly prepared for other media representations available from the server 12 on-demand may be provided to the media clients 14, 16, 18 in advance of access, optionally with additional information regarding the available media, messaging, protocols, etc., so as to enable the media clients 14, 16, 18 to select a suitable representation for streaming, including dynamically and/or continuously changing the selection as network resources vary due to additional media clients 14, 16, 18 requesting and/or ceasing streaming or other operations diminishing or increasing available bandwidth. Similar information to that provided in charts 72, 74 may be generated for live or real-time media using estimates or other forecasts, including but not necessarily limited to statistical characterizations, of expected segment-based bit rate variations when the associated media is contemplated for constant quality encoding (estimates would be unnecessary for constant bit rate encodings as the media clients 14, 16, 18 would know the intended bit rate throughout its entirety). Live or real-time media related charts may optionally span less an entirety of the associated media and instead include an initial segment-level forecast or estimate with subsequent charts or updates being provided as the live media progresses.
There may be one adaptation set for the main video component and a separate one or more for a main audio component or other material available like captions or audio descriptions. The illustrated MPD 80 omits the additional, non-video components for exemplary purposes in order to illustrate the contemplated optimization of the MPD 80 to support communicating information associated with constant quality encodings. Each adaptation set contains a set of representations describing a deliverable encoded version of one or several media content components, which is illustrated for exemplary purposes to correspond with the above-described first and second representations respectively encoded at a constant quality commensurate with Q1 and Q2. A representation may include one or more media streams (one for each media content component in the multiplex) and be sufficient to render the contained media content components. By collecting different representations in one adaptation set, the media server may express the corresponding representations as being equivalent content.
The media clients 14, 16, 18 may dynamically switch from representation to representation within an adaptation set in order to adapt to network conditions or other factors. Switching refers to the presentation of decoded data up to a certain time t, and presentation of decoded data of another representation from time t onwards. If representations are included in one adaptation set, and the media client 14, 16, 18 switches properly, the media presentation may be expected to be perceived seamless across the switch. Media clients 14, 16, 18 may ignore representations that rely on codecs or other rendering technologies they do not support or that are otherwise unsuitable. Within a representation, the media may be divided in time into the segments illustrated in
One non-limiting aspect of the present invention contemplates segments in each representation representing the same duration or portion of the media content such that each segment matches with one segment in another representation for exemplary purpose as segment duration may differ from representation to representation. The segments may generally relate to intervals or other identifiable portions of the corresponding representation amenable to conveying the corresponding bit rate variations necessary to maintain a constant quality throughout. The bit rate variations are shown with respect to exemplary numerical values demarcating an average or other summation of the bit rate utilized for encoding the corresponding segment. This segment-level granularity may be preferred over identifying a bit rate value for each encoded frame in order to limit the number of bit rate values included within the metadata to represent the available encodings, particularly since the client is typically only able to switch representations on a segment boundary.
The MPD 80 may include a quality index attribute within an attribute table or other construct sufficient to convey a constant quality encoding level for the corresponding representation. The client may analyze the constant quality index attributes as part of its decision making process when determining a suitable representation for streaming. The quality index attributes may be included in the MPD 80 to differentiate the representations being encoded at a constant quality from those being encoded at a constant bit rate when the MPD 80 also includes information for available representations encoded at a constant bit rate (not shown). The number of quality index attributes included within the MPD 80 may equal the number of representations encoded at a constant quality and may be communicated along with the first chart 70 or other information sufficient to enable the client to differentiate parameters associated with the corresponding quality index attribute, e.g., 4k, 30 fps, etc. The MPD 80 may also include bandwidth or bit rate attributes within the attribute table or other construct to identify the bit rate of an associated segment. The number of bandwidth or bit rate attributes included with the MPD 80 may equal the number of segments so that the media client 14, 16, 18 can assess whether network resources are likely to support the bit rates needed or estimated for an entirety of the corresponding representation.
Returning to
The perceivable quality assessment may also include relating spatial and/or temporal resolution or other encoding specifics to additional characteristics or capabilities of the user. One example may be associating a lower optimal resolution for users lacking sufficient eyesight or indicating a preference for lower resolution streaming, e.g., some users may desire or prefer lower resolution streaming in order to minimize consumption of network resources. As shown in the exemplary graph, various examples are available for assessing optimal viewing distance as function of television size and resolution. With respect to the distance/perception type of assessment, one approach may be utilize a relationship like that depicted in the graph to represent a resolvable spatial resolution as function of perception for the human eye at various distances. Based on this, a user with a 60″ 4k display could be downgraded to 1080p content (without them perceiving a difference) if the system detects that there are no viewers closer than 8′ to the display.
As discussed previously, the perceivable quality assessment may also account for compression effects beyond just encoded spatial and/or temporal resolution. For example, an UltraHD content source could be minimally compressed (e.g. with a low quantizer parameter) at 50 Mbps, in which case the graph may be an accurate representation of perceivable capabilities, however, the graph 88 may also instead represent UltraHD to be heavily compressed down to 10 Mbps in which case it might be more like 1080p quality. A user with the 60″ 4k display might stream a 4k encoding but it might be delivered at 50 Mbps if the closest viewer is less than 8′ away, 30 Mbps if they are 8′-10′ away, 20 Mbps if they are 10′-15′ away, and 10 Mbps if they are farther than 15′ away. The exact relationship between bit rate and effective or optimal resolution may depend on the encoder and on the content, optionally with each encoding being analyzed by the encoder to produce a suitable quality index for representing this effective resolution that is then used by a quality decision function or perceivable assessment process to determine a suitable representation for streaming. Quality Index could be directly represented as an effective vertical resolution (720, 1080, 2160, or any value in between) so that the decision function could just use the simple relationship depicted by the graph 88.
The perceivable quality assessment of Block 84 may include any number of assessments based on user presence, distance, capabilities, characteristics as well as quality levels, variables and other information associated with certain encodings and capabilities for transmitting the encodings. Block 102 relates to processing the related information and selecting a representation for presentation. The selection process may include initially eliminating the representations associated with quality levels exceeding those likely to be perceivable for the viewer, i.e., eliminating the quality levels exceeding those included in the graph 88 for a current viewing distance and display size associated with the viewer. The selection process may then include assessing bit rate variances for each remaining representation to determine whether network resources, bandwidth restrictions or other limitations are likely to influence an ability of the media client to support streaming an entirety of the corresponding representations. The representations likely to be perceivable but having one or more segment bit rates exceeding that likely to be supportable may be eliminated from the selection process such that the highest-quality or best representation remaining thereafter may be selected for presentation and/or a switch to another representation may be scheduled for the segments having unsupportable bit rates.
Block 104 relates to the media client transmitting an HTTP get request or other suitable inquiry to a URL or other address associated with the selected representation and/or the attendant segments to initiate streaming. While the streaming is predominately described with respect to HTTP protocols and communications over the Internet, the streaming or other signaling may be undertaking using non-HTTP processes without deviating from the scope and contemplation of the present invention. The transmitting process and/or the selection process may be continuously assessed to adjust for network congestion or other transmission variables such that representations may be switched as a function thereof. One non-limiting aspect of the present invention contemplates continuously monitoring a distance of the viewer to a television display when streaming video, such as using the above identified presence detection capabilities, and/or through other mechanisms so as to facilitate adjusting access representations depending on changes in user distance. Optionally, a distance sensor in the form of a scanning device, optical or signal sensor or other device may be included on or associated with the media clients 14, 16, 18 to sense viewer distance.
The distance sensor may optionally be integrated with the presence detection capabilities to differentiate between viewers when multiple viewers are within a room and/or a viewing distance of the display for purposes of controlling the viewing distance measurement. The presence detection system may also be beneficial in assessing whether viewers are transient or otherwise not likely to be viewing the representation, e.g., the distance measurement may be based on a static viewer (e.g., station for a predetermined period of time) as opposed to another occupant traveling through the corresponding room or otherwise engaging in activities, such as with a tablet or second screen device, indicating a lack of awareness or interest in the streamed media. Once the media is transmitted in Block 104, and depending on whether the associated media client 14, 16, 18 is in possession of a full set of bit rate values or is periodically receiving bit rate updates thereafter, the media client 14, 16, 18 may continually evaluate during playback the recent history of actual received bit rates (i.e. the segment size divided by the segment download time), the amount of buffered video, the list of bit rates for future segments (weighing the near future segments more heavily) across the different representations, and the current perceivable quality index of the user to determine whether it should continue with the representation that it is currently downloading or switch to a different representation more appropriate for given network conditions and/or movement of the user.
As supported above, one non-limiting aspect of the present invention contemplates a display fitted with a viewer presence and viewer distance estimation sensor (e.g. Kinect or Primesense) that provides an input to an IP-STB or smart TV in order to affect the bitrate or quality selection algorithm in an adaptive bit rate system. Viewers' distances to the display, as well as information on the display itself (size, native resolution, etc) may be used to calculate the maximum video quality that can be perceived by the viewers. This information may be used in the adaptive bit rate selection algorithm to select an appropriate stream that provides the maximum perceivable quality at the minimum bit rate, thereby jointly maximizing perceptual video quality across a set of competing viewers that share a bottleneck network link and freeing capacity for other services. The integration of presence detection and other viewer characteristics with the contemplated constant quality media encoding and optimal perception characteristics allows the present invention to facilitate a streaming experience where media may be delivered at the lowest quality necessary to meet selectable perception levels. Such a capability may be particularly beneficial over constant bit rate encoding processes where the media client simply selects the highest quality level supportable giving associated bandwidth or bit rate capabilities regardless of whether the viewer can actually perceive the corresponding quality level. The present invention eliminates unnecessary inefficiencies and consumption of network resources without negatively or unduly influencing the viewer experience when network resources support a higher quality video than the user is able to perceive.
As a coarse example, when it is detected that viewers are not close enough to the display to perceive the difference in quality between 4k resolution and 1080p, the system would limit its choices of streams to those with 1080p resolution and below. In general, the algorithm would not be constrained to resolution selection, but would more optimally use a video quality metric (such as PSNR or another perceptual evaluation of video quality), and a model of display-mediated human visual acuity. In implementation, this system could be as simple as an effective spatial resolution score (scalar value) that accompanies each stream choice in the ABR manifest. The IP-STB player then simply uses the display size (and resolution) along with the distance to the closest viewer to calculate the maximum perceivable spatial resolution for that viewer, and compares that to the scores accompanying the stream choices.
Access Network capacity cost is an important factor that reduces the attractiveness of IPTV solution. One aspect of the present invention contemplates an adaptive bir rate (ABR) video system that maximizes joint video quality across a set of users that share a bottleneck link. The variant streams may be efficiently encoded using constant quality encoding (VBR) or near-constant quality encoding (constrained VBR) and are described to the player in terms of the statistical properties of the bit-rate. Clients may be presented with various encodings of a stream, and each client selects the “best” encoding that it can reliably receive. This selection criteria may be preferred oved a set of constant bit rate encodings (variable quality) at pre-configured bit rates where rate selection is performed by the client via use of historical estimates of available channel capacity. The various encodings contemplated by the present invention represent a set of constant quality encodings (variable bit rate) at pre-configured quality levels so as to enable the client to additionally use statistical information about the encodings of the stream (segment size distributions and autocorrelation) to select the encoding that provides maximum video quality while keeping the calculated probability of buffer under-run below an established threshold. As a result, individual clients that share a bottleneck link (e.g. cable serving group) achieve better QoE (higher and more constant video quality). This technique would be useful for any networked video distribution system. It becomes much more feasible when HTML5 Media Source Extensions are available in the player.
While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
This application claims the benefit of U.S. provisional Application No. 62/094,479 filed Dec. 19, 2014 the disclosure of which is incorporated in its entirety by reference herein.
Number | Date | Country | |
---|---|---|---|
62094479 | Dec 2014 | US |