Field of the Invention
Embodiments of the present invention generally relate to encoded video data processing and distribution systems and, more particularly, to apparatus and method and supporting system for transcoding video data content from one particular resolution and/or rate, to a content with different resolution and/or rate.
Description of the Related Art
As is well known in the art, video content is encoded into digital representation for storage, transmission and ultimately playback. Some well known encoding methods are: MPEG2, H.264 and HEVC. Broadly speaking, these encoding methods remove redundancies from the original content in order to produce representation of smaller size that facilitates more efficient handling.
Video encoding methods produce data that can be generally divided into two categories: (1) “Encoding Decisions” and (2) “Residual Data”. In order to minimize size of resulting content, video encoders: (i) find similarities between spatial or temporal subsets of data (e.g. motion vectors, common properties of neighboring blocks of pixels); (ii) select appropriate coding structure and methods from pre-determined options; and (iii) construct information required for content reconstruction, decoding. These can be called “Encoding Decisions” (e.g. content of SPS, PPS, SEI, Slice Headers and parts of Slice Data for H.264 codec) and their derivation is computationally extensive. The “Residual Data” in absence of “Encoding Decision” contains absolute values of multimedia samples. When “Encoding Decisions” include prediction of current samples or data elements based on previously decoded subsets of data, “Residual Data” contains representation of the difference between said prediction and current samples under consideration.
Video delivery prior to rise of the Internet has been based on the broadcast principle: deliver content at one resolution and rate to all users. Internet and wireless networks (cell, local, wide-area) as well as proliferation of playback devices (from cell phones, pads to video screens) of various sizes and capabilities, brought forth requirements for delivery of content at various resolutions and rates. Furthermore, in order to compensate for the dynamic nature of network and computational resources (e.g. available bandwidth, CPU or memory allocated for processing), content providers need systems that can dynamically change resolution and/or rate of delivered content while it is being consumed by users.
Typical solutions for the above mentioned requirements fall into three broad categories: (1) encode and ready for delivery multiple versions (resolutions, rates) of digital content; (2) encode content as set of segments or hierarchy of resolutions and rates (layers), each of which can be extracted from the totality of content (scalable video); and (3) encode content at fixed (preferably highest) resolution and rate then dynamically transcode (decode then re-encode) to required resolution and rate before delivery to playback destination.
Approach (1) provides highest ratio of encoding quality vs. content size but requires large amount of storage and network bandwidth utilization to keep and transfer multiple versions of the same content thus resulting in high cost, and delivery that is sensitive to network delays that can undermine proper user experiences.
Approach (2) known as “Scalable Video”, was designed to address the need for multiple resolutions and rates. These systems never achieved significant adoption due to the fact that the resulting content size is significantly larger then non-scalable maximum resolution option even for minimum number of multi-resolution layers. Moreover, the quality of playback of each layer (resolution, rate) that can be extracted from scalable content is lower then the quality that can be achieved by non-scalable representation for the same requirements.
Approach (3) requires large quantities of expensive equipment since multimedia encoding/transcoding is highly computationally intensive operation (1-2 orders of magnitude more intensive then multimedia decoding) and also contributes to lower quality of displayed content due to lossy nature of encoding/transcoding. The prior art and common practices in transcoding domain were mostly focused on improvements of transcoding speed through better guess for the initial search point (limit search area) based on results from data at different resolutions or rates.
While the above mentioned art and practices do address requirements for multiple resolutions and data rates delivery, these approaches incur unnecessarily high cost at either storage or core network or at the network edge (for distributed delivery systems), or sacrifice quality in order to control said costs. As such, there is a need in the art for method and apparatus (system) that will address requirements for multiple resolutions and data rates delivery that improves cost structure and quality of delivery without sacrificing multimedia playback quality.
Various embodiments of the present invention generally include a method and apparatus for efficient system for efficient video transcoding based on encoder decisions extraction. In one embodiment, the method comprises (i) separation of “Encoding Decisions” En and “Residual Data” Rn (e.g. “residual( )” in H.264 specification) from content Cn encoded at resolution Sn (including but not limited to spatial dimensions, pixel bit-length, chroma option), rate Bn, where said content was computed by known and pre-selected video scaling method Mn from content C0 encoded at resolution S0, rate B0; (ii) optional processing and delivery of content C0 and “Encoding Decisions” En of content Cn to transcoding apparatus; (iii) re-coding by re-computation of “Residual Data” Rn from content C0 scaled by Mn and “Encoding Decisions” En, resulting in (optionally) perfect re-construction of content Cn. Said transcoding method operates on either the whole content or selected content parts.
In one embodiment, the apparatus comprises of: (1) system for separation of “Encoding Decisions” En and “Residual Data” Rn from content Cn encoded at resolution Sn, rate Bn; (2) system for optional processing and delivery of content C0 and “Encoding Decisions” En; (3) system for (optionally) perfect re-construction of content Cn from “Encoding Decisions” En and content C0. Said transcoding apparatus/system operates on either the whole content or on selected content parts.
In one embodiment, the system (1) from said embodiment apparatus, decodes content C0, scales it to Pn with scaling method Mn, then encodes to content Cn. Said system then removes all “residual( )” portions from H.264 video content Cn where the remaining data constitutes “Encoding Decisions” En. The re-construction system (3) from said embodiment apparatus, decodes content C0, scales it to Pn with the same method Mn used to construct Cn from C0, applies decoded En to Pn and re-constructs content Cn. The re-construction system has decoding complexity which is orders of magnitude smaller then encoding complexity resulting in transcoding system of this embodiment that is far more efficient then full decode/re-encode systems of known art.
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and specific video coding examples selected for ease of understanding and are therefore not to be considered limiting of its scope, for the invention admits other video coding methods and may admit to other equally effective embodiments.
Content Cn is further processed by separator 140, details of which are illustrated in
Content C0 and “Encoding Decisions” En are inputs to the receiving side 300 of the system illustrated herein. Encoded video content C0 is decoded by decoder 310, identical to the decoder 110 by decoding specification and/or by design and produces sequence of raw video frames P0*. Scaler 320 that processes P0* is identical to scaler 120 thus ensuring that resulting raw sequence Pn* on the receiving side 300 is identical to Pn from the sending side 100.
The resulting sequence Pn* from system 300 and received content En are inputs to re-coder 330, details of which are illustrated in
System 130 on
System 330 on
The foregoing description of embodiments of the invention comprises a number of elements, systems, devices, circuits and/or assemblies that perform various functions as described. These elements, systems, devices, circuits and/or assemblies are exemplary interpretations of means for performing their respectively described functions.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims benefit or U.S. provisional patent application No. 61/996,008 filed Apr. 28, 2014 which are herein incorporated by reference.
| Number | Date | Country | |
|---|---|---|---|
| 61996008 | Apr 2014 | US |