An embodiment of the invention is related to digital video content delivery and display techniques, and particularly to techniques for improving video quality by providing feedback to upstream processes in the video chain. Other embodiments are also described.
Digital media was popularized using only a limited number of formatting parameters out of all formatting parameters that were available. For example, the audio compact disc (CD) was popularized as a two-channel 16-bit pulse-code modulation (PCM) encoding at a 44.1 kHz sampling rate per channel. However, there was a four-channel variant in the Red Book standard that few people know of because it was never implemented and used widely. Additionally, there was a high-frequency pre-emphasis upon record and de-emphasis upon playback to reduce noise as an option that was never widely used. In television, high-definition television (HDTV) standardized a table of 19 different formats. Only about three of the 19 formats ever saw widespread use, likely owing to the difficulty of reaching an agreement among various stakeholders and the complexity of dealing with a large number of different formats.
Currently, 4K resolution (4K) is being marketed to the consumer as the next big step for display devices and content, in the fields of digital television and digital cinematography. The television industry has adopted ultra-high definition television (UHDTV) as its “4K” standard. However, as the International Broadcasting Convention (IBC) conference sessions have indicated, 4K in and of itself is not sufficient to produce the sensation of a whole new experience that is needed to make a major impact on the market.
Delivery of video content has advanced by making picture quality adaptable to the bit rate available to any given subscriber. For example, for some online video streaming services, the picture quality lies not in the number of pixels in the image, which remains typically at 1920×1080 progressively scanned over a variety of picture qualities. What typically varies with different available bit rates is the amount of compression, so that subscribers with higher bit rate connections can enjoy a better picture. The streaming video provided by these services may be all high-definition (HD) in terms of resolution or pixel count, but it is HD scaled with compression to the capabilities of the delivery, or even delivered with less resolution to maintain the connection and enhance the user experience by producing pictures continuously, albeit of potentially varying quality. While providing streaming video with variable bit rate is well known in the art, picture quality parameters such as bit depth, color gamut, and frame rate, have always been standardized to one value across a potential range of compression ratios
Further, streaming video with higher bit rates has become more widely available as time goes by. For example, Data Over Cable Service Interface Specification (DOCSIS) 3.0 is currently being deployed by cable companies, with 18 Mbps chunks deliverable to a wide base of homes, and multiple chunks to each subscriber possible. Video compression has also improved. The ITU-T H.265 standard is new on the scene, offering an improvement in bit rate reduction at the same quality, or greater quality at the same bit rates, over the ten-year-old H.264 format which itself offered an improvement of picture quality at lower bit rates over the twenty-year-old MPEG-2 standard used for over-the-air Advanced Television Systems Committee (ATSC).
A method for delivering desired video content over a packet-switching data communications network is described. Several device parameters are received from a user device. The device parameters define a capability of the user device in handling different video quality parameters. A video format is selected from several available video formats, based on the received device parameters and based on contextual information of the video content. In one embodiment, in order to select the video format, a weight is assigned to each video quality parameter based on the contextual information of the video content. The video format is then selected based on the weights assigned to the different video quality parameters. Video, containing the desired video content in the selected video format, is retrieved from a media library and then delivered to the user device.
A method for streaming video content is described. In one embodiment, the video content being streamed is live video content. The method receives multiple parameters defining a connection quality of users subscribed to receive the video content. The method selects a video format for a video containing the video content from several available video formats based on the parameters and based on contextual information of the video content. In one embodiment, in order to select the video format, the method assigns a weight to each video quality parameter based on the contextual information of the video content, and selects the video format based on the weights assigned to the different video quality parameters. The method captures the video containing the video content in the selected video format before delivering the video containing the video content to a content server.
A system for delivering desired video content is described. The system includes a processor and video format selection logic. The video format selection logic is to receive several device parameters from a user device. The device parameters define a capability of the user device in handling different video quality parameters. The video format selection logic is to select a video format for a video containing the desired video content from several video formats based on the device parameters and based on contextual information of the desired video content. In one embodiment, in order to select the video format, the video format selection logic is to assign a weight to each video quality parameter based on the contextual information of the desired video content, and select the video format based on the weights assigned to different video quality parameters. The video format selection logic is further to retrieve the video containing the desired video content in the selected video format from a media library. The video format selection logic is to stream the video containing the desired video content in the selected video format to the user device over a packet-switching data communication network.
A system for streaming video content is described. In one embodiment, the video content being streamed is live video content. The system includes a processor, a parameter determination logic, and a video capture unit. The parameter determination logic is to select a video format for a video containing video content from several video formats based on several parameters defining a connection quality of users subscribed to receive the video content and based on contextual information of the video content. In one embodiment, the parameters include a maximum bit rate that can be sustained to any one subscriber to the video content. In one embodiment, in order to select the video format, the parameter determination logic is to assign a weight to each video quality parameter based on the contextual information of the video content, and select the video format based on the weights assigned to different video quality parameters. The video capture unit is to capture the video containing the video content in the selected video format. The video capture unit is further to deliver the video containing the video content to a content server. In one embodiment, the video capture unit is to convert the video containing the video content from the selected video format into a set of video formats.
In one embodiment, the video quality parameters include one or more of codec, frame rate, contrast, brightness, resolution, bit depth, Electronic-to-Optical Transfer Function (EOTF), and color gamut. In one embodiment, the contextual information specifies content type of the desired video content. The content types may include feature film, sports, documentary, talk show, television comedy, and reality show.
The above summary does not include an exhaustive list of all aspects of the invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
A system that adjusts video quality intelligently by having feedback to upstream processes in the video distribution chain, and with the content in mind, is described. In the following description, numerous specific details are set forth to provide thorough explanation of embodiments of the invention. It will be apparent, however, to one skilled in the art, that embodiments of the invention may be practiced without these specific details. In other instances, well-known components, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment.
In the following description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. “Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other. “Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.
The processes depicted in the figures that follow are performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (such as is run on a general-purpose device or a dedicated machine), or a combination of both. Although the processes are described below in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in different order. Moreover, some operations may be performed in parallel rather than sequentially.
The terms “server,” “client,” and “device” are intended to refer generally to data processing systems rather than specifically to a particular form factor for the server, client, and/or device.
Rather than making one decision about video quality at the beginning of the video distribution chain, one embodiment of the invention has deliberate feedback given as far back as possible in the video distribution chain, for example as far back as the source capture (e.g., in a live context), to affect the picture quality intelligently, and with the content in mind. In one embodiment, one stream is picked from a plurality of streams coming from the camera that represents its highest domain, or from some standard down-converters (such as, e.g., in a truck at the Super Bowl) downrezing all the cameras to a variety of outputs. For example, sports content may use a higher frame rate than a movie to achieve a smoother representation of motion, but probably does not need as much dynamic range. A movie on the other hand, is unlikely to make use of anything over 24 frames per second, whereas dynamic range is highly valuable, especially in movies lit with low-key lighting. In one embodiment, parameters about a user device's capability in displaying video is provided to the content server, along with the program contextual information, to determine an optimal video format for streaming the video to the user device. In one embodiment, the maximum bit rate that can be sustained to any one subscriber and the program contextual information are used to determine the original video format for source capture. In one embodiment, a back channel transmitting parameters about the user device or the connection quality can be used to control the forward channel, which streams the video. The back channel controls the forward channel by modifying video format at the content server or at the time of source capture.
A traditional media streaming communications protocol (e.g., HTTP Live Streaming) starts playback at a bit rate this is considered sustainable. During the playback, the traditional media streaming communications protocol ratchets up or down the bit rate according to the channel capacity. For example, if a higher bit rate can be sustained then the higher bit rate is used. If the current bit rate results in packet loss then a different selection of bit rate encoded content at a lower peak bit rate is chosen. Usually the same codec at different bit rates are used and the selection of different streaming formats is not affected by capability parameters of the video playing device. One embodiment of the present disclosure straddles the traditional media streaming communications protocol, with more different input parameters coming from the video playing device in selecting from different alternate streaming formats. For example, a user may receive a steaming format that has better video quality because he/she has a better video playing device, which does not happen if a traditional media streaming communications protocol (e.g., HTTP Live Streaming) is used. This provides an incentive for the whole market to migrate to better picture quality.
The content producer 105 produces video content by capturing video and stores the video in a specific video format. Once a video is captured, it is put on the content server 110 for distribution. The content server 110 is a content transmission device that stores media content and receives requests from other devices for the media content. Once a request is received and processed, the content server 110 converts the requested media content into a bit stream and sends it to the requesting party, e.g., the receiving device 120. The content server 110 transmits data to another device through a connection between the two devices over the network 115. In one embodiment, the network 115 is a packet switched network such as the Internet. Some examples of a content server 110 include an electronic device such as a server, a desktop, a laptop, a netbook, a tablet computer, a smartphone, etc. that is capable of sending and receiving data to and from another device. In some cases, one or more devices provide the requested media content. The content server 110 could also be part of a content delivery network (CDN). A CDN is a system of computers containing copies of data placed at various nodes of a network.
The receiving devices 120 are user devices that receive video stream from the content server 110 and play back the received video to the user. The content server 110 and a receiving device 120 establish a connection over the network 115 and start a communication session by exchanging data. By its own initiative (i.e., based on user inputs), the receiving device 120 requests a piece of media content from the content server 110. After receiving the request, the content server 110 generates the bit stream enclosing the requested piece of media content and sends the bit stream to the receiving device 120 over the network 115.
When the video is sports programming, the whole shape of the diagram is completely different.
Therefore, a content subscriber would prefer a brighter picture for a sports video, and would definitely need a higher frame rate that he would not need for movies. However, every user device has its limitation regarding the highest quality of video it can display. Therefore, one embodiment of the invention is for the user device to tell the content server what kind of frame rate it can handle, how much dynamic range it can reproduce, how much color space it has, and so on. The content server adjusts the video format based on the parameters it received from the user device and based on the type of video content. In one embodiment, there is a back channel for transmitting the video quality parameters from the user device to the content server. The back channel is a low bit rate connection that sends metadata about the video quality that the user device can handle to the content server so that the content server can send the user device the best quality video that the user device can accept.
The content contextual information generator 310 receives content identification information 305 for the desired content from the receiving device 120 and generates contextual information 315 for the desired content accordingly. In one embodiment, the contextual information 315 is the content type of a video containing the desired content, e.g., feature film, sports, documentary, talk show, television comedy, reality show, etc. In one embodiment, the content contextual information generator 310 may generate the contextual information 315 without receiving the content identification information 305 from the receiving device 120.
The video format selection logic 320 receives device parameters 322 from the receiving device 120. The device parameters 322 define a capability of the receiving device 120 in handling different video quality parameters. In one embodiment, the device parameters 322 may include the codec, the frame rate, contrast, brightness, resolution, bit depth, Electronic-to-Optical Transfer Function (EOTF), and color gamut that the receiving device 120 can display. In one embodiment, the video format selection logic 320 also receives contextual information 315 from the content contextual information generator 310. The video format selection logic 320 selects an optimal video format for a video containing the desired content from multiple available video formats in the video database 330 based on the device parameters 322 and the contextual information 315. The video format selection logic 320 assigns different weight to each video quality parameter based on the contextual information 315. The video format selection logic 320 then selects the optimal video format based on the different weights, e.g., by selecting a video format that best matches the weights assigned to different video quality parameters. The video format selection logic 320 retrieves a video 325 containing the desired content in the selected optimal video format from the video database 330 and sends the video 325 to the receiving device 120.
In one embodiment, the video format selection logic 320 also considers the bandwidth of the connection between the receiving device 120 and the content server 110 in determining the optimal video format. When the bandwidth is limited, the video format selection logic 320 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter can be degraded. In one embodiment, the scale function is derived partially based on the contextual information 315 of the desired video content. For example, different weights are assigned to each video quality parameter based on the contextual information 315, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
In one embodiment, the video database 330 stores multiple versions of each video. For example, and as illustrated in
In one embodiment, the receiving device 120 lets the content server 110 know that the display of the receiving device 120 is capable of handling various color gamut such as Rec. 709, DCI P3, Rec. 2020, etc. The content server 110 then chooses a corresponding video format. Movies for instance are in DCI P3 space, larger than HDTVs Rec. 709. So an ordinary HDTV needs a video format with the limited color gamut of ITU-R Rec. BT.709, while a DCI P3 or ITU-R Rec. BT.2020 device can use the DCI P3 video format directly.
The receiving device 120 and the content server 110 are described above for one embodiment of the invention. One of ordinary skill in the art will realize that in other embodiments, this system can be implemented differently. For instance, in one embodiment described above, certain modules are implemented as software modules. However, in another embodiment, some or all of the modules might be implemented by hardware, which can be dedicated application specific hardware (e.g., an application specific integrated circuit, ASIC, chip or component) or a general purpose chip (e.g., a microprocessor or field programmable gate array, FPGA).
At block 405, process 400 receives several device parameters from a user device (e.g., the receiving device 120 of
At block 410, process 400 selects a video format from several available video formats based on the received device parameters of the user device and based on contextual information of the desired video content. In one embodiment, the contextual information is the content type of the video, e.g., feature film, sports, documentary, talk show, television comedy, reality show, etc. Process 400 assigns different weights to each video quality parameter based on the contextual information of the video content. Process 400 then selects the optimal video format based on the different weights, e.g., by selecting a video format that best matches the weights assigned to different video quality parameters. In one embodiment, process 400 also considers the bandwidth of the connection between the user device and the content server in determining the optimal video format. When the bandwidth is limited, process 400 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter can be degraded. In one embodiment, the scale function is derived partially based on the contextual information of the video content. For example, different weights are assigned to each video quality parameter based on the contextual information, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
At block 415, process 400 retrieves the video containing the desired content in the selected video format from a media library (e.g., the video database 330 of
One of ordinary skill in the art will recognize that process 400 is a conceptual representation of the operations executed by the content server to stream video to the user device. The specific operations of process 400 may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, process 400 could be implemented using several sub-processes, or as part of a larger macro process.
The connection quality determination logic 510 of the content server 110 determines a set of connection quality parameters 515 for all the receiving devices requesting video content. In one embodiment, the set of connection quality parameters 515 includes the maximum bit rate that can be sustained to any one subscriber to the video content. The connection quality determination logic 510 sends the set of connection quality parameters 515 to the video capture parameter determination logic 530 of the content producer 105.
The content contextual information generator 520 of the content producer 105 generates contextual information 525. In one embodiment, the contextual information 525 is the content type of the desired video content, e.g., sports, talk show, reality show, etc.
The video capture parameter determination logic 530 receives the set of connection quality parameters 515 from the connection quality determination logic 510 of the content server 110 and the contextual information 525 from the content contextual information generator 520. The video capture parameter determination logic 530 selects an optimal video format for the video containing the desired content from multiple available video formats based on the set of connection quality parameters 515 and the contextual information 525. In one embodiment, the video capture parameter determination logic 530 assigns different weights to each video quality parameter based on the contextual information 525. The video capture parameter determination logic 530 then selects the optimal video format based on the different weights, e.g., by selecting a video format that best matches the weights assigned to different video quality parameters.
In one embodiment, when the set of connection quality parameters 515 indicates that the connection quality between the receiving device 120 and the content server 110 is poor, the video capture parameter determination logic 530 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter can be degraded. In one embodiment, the scale function is derived partially based on the contextual information 525. For example, different weights are assigned to each video quality parameter based on the contextual information 525, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
The video capture parameter determination logic 530 sends video quality parameters 535 that represent the optimal video format to the video capture unit 540. In one embodiment, the video quality parameters 535 include values for codec, frame rate, contrast, brightness, resolution, bit depth, EOTF, and color gamut. The codec is a coding or compression format used for the video. The bit depth specifies the number of bits used for each color component. More bits within a particular dynamic range results in less likelihood of seeing “color banding” (quantization error), which is a problem of inaccurate color presentation in video display. More bits could also be expended on a wider dynamic range. If a wider dynamic range is used, then there is a corresponding need for more bits to maintain constant, and hopefully invisible, quantization error. The color gamut is the complete set of colors found within the video. In one embodiment, the color gamut is the range of color that may be portrayed by a given display device, which is related to the chromaticity coordinates of the color primaries of the display.
The video capture unit 540 captures the video containing the video content using the video quality parameters 535 and sends the captured video 545 to the receiving device 120 through the content server 110. In one embodiment, the captured video containing the video content using the video quality parameters 535 is converted (e.g., transcoded) into various video streams. The content server 110 or the content producer 105 then selects one of the transcoded video streams suitable for the receiving device 120 and sends the selected transcoded video stream to the receiving device.
One of ordinary skill in the art will realize that in other embodiments, this system can be implemented differently. For instance, in one embodiment, instead of determining a single set of connection quality parameters, the connection quality determination logic 510 of the content server 110 can determine multiple sets of connection quality parameters for the receiving devices. In one embodiment, instead of selecting an optimal video format for the video containing the desired content, the video capture parameter determination logic 530 can select multiple video formats for the video containing the desired content from multiple available video formats based on the multiple sets of connection quality parameters and the contextual information 525. In one embodiment, the video capture parameter determination logic 530 assigns different weights to each video quality parameter based on the contextual information 525. The video capture parameter determination logic 530 then selects the multiple video formats based on the different weights, e.g., by selecting video formats that match the weights assigned to different video quality parameters.
In one embodiment, when a set of connection quality parameters indicates that the connection quality between a set of receiving devices and the content server 110 is poor, the video capture parameter determination logic 530 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter of a video format for the set of receiving devices can be degraded. In one embodiment, the scale function is derived partially based on the contextual information 525. For example, different weights are assigned to each video quality parameter based on the contextual information 525, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
The receiving device 120, the content producer 105, and the content server 110 are described above for one embodiment of the invention. One of ordinary skill in the art will realize that in other embodiments, this system can be implemented differently. For instance, in one embodiment described above, certain modules are implemented as software modules. However, in another embodiment, some or all of the modules might be implemented by hardware, which can be dedicated application specific hardware (e.g., an application specific integrated circuit, ASIC, chip or component) or a general purpose chip (e.g., a microprocessor or field programmable gate array, FPGA).
At block 610, process 600 selects a video format for the desired video content from several available video formats based on the connection quality parameters and/or the device parameters, and based on contextual information of the desired video content. In one embodiment, the contextual information is the content type of the video, e.g., sports, talk show, reality show, etc. In one embodiment, process 600 assigns different weights to each video quality parameter based on the contextual information of the desired video content. Process 600 then selects the optimal video format based on those different weights, e.g., by selecting a video format that best matches the weights assigned to different video quality parameters. In one embodiment, when the connection quality parameters indicate that the connection quality is poor, process 600 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter can be degraded. In one embodiment, the scale function is derived partially based on the contextual information of the desired video content. For example, different weights are assigned to each video quality parameter based on the contextual information, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
At block 615, process 600 captures the desired video content in the selected video format. At block 620, process 600 delivers the captured video containing the desired video content in the selected video format to the user devices through a content server (e.g., the content server 110 of
One of ordinary skill in the art will realize that in other embodiments, process 600 can be implemented differently. For instance, in one embodiment, instead of receiving a single set of connection quality parameters, process 600 can receive multiple sets of connection quality parameters for the user devices. In one embodiment, instead of selecting a single video format for the desired content, process 600 can select multiple video formats for the desired content from multiple available video formats based on the multiple sets of connection quality parameters and based on the contextual information of the desired content. In one embodiment, process 600 assigns different weights to each video quality parameter based on the contextual information of the desired content. Process 600 then selects the multiple video formats based on the different weights, e.g., by selecting video formats that match the weights assigned to different video quality parameters.
In one embodiment, when a set of connection quality parameters indicates that the connection quality for a set of user devices is poor, process 600 uses a scale function for the importance of each video quality parameter to determine how much each video quality parameter for a video format for the set of user devices can be degraded. In one embodiment, the scale function is derived partially based on the contextual information. For example, different weights are assigned to each video quality parameter based on the contextual information, and the more weights a video quality parameter is assigned, the less the video quality parameter is to be degraded.
One of ordinary skill in the art will recognize that process 600 is a conceptual representation of the operations executed by the content server to stream video to the user device. The specific operations of process 600 may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, process 600 could be implemented using several sub-processes, or as part of a larger macro process.
A display controller and display device 709 provide a digital visual user interface for the user; this digital interface may include a graphical user interface similar to that shown on a Macintosh computer when running the OS X operating system software, or an Apple iPhone when running the iOS operating system, etc. The system 700 also includes one or more wireless communications interfaces 703 to communicate with another data processing system, such as the system 700 of
The data processing system 700 also includes one or more user input devices 713, which allow a user to provide input to the system. These input devices may be a keypad or keyboard, or a touch panel or multi touch panel. The data processing system 700 also includes an optional input/output device 715 which may be a connector for a dock. It will be appreciated that one or more buses, not shown, may be used to interconnect the various components as is well known in the art. The data processing system shown in
The digital signal processing operations described above, such as encoding and decoding of cellular packets, flow/rate control, packet jitter control, setup or pickup of telephone call, and the audio signal processing including for example filtering, noise estimation, and noise suppression, can all be done either entirely by a programmed processor, or portions of them can be separated out and be performed by dedicated hardwired logic circuits.
The foregoing discussion merely describes some exemplary embodiments of the invention. One skilled in the art will readily recognize from such discussion, from the accompanying drawings, and from the claims that various modifications can be made without departing from the spirit and scope of the invention.
This application claims the benefit of the earlier filing date of U.S. Provisional Application No. 61/946,490, filed Feb. 28, 2014, entitled “Intelligent Video Quality Adjustment”.
Number | Date | Country | |
---|---|---|---|
61946490 | Feb 2014 | US |