There is a growing demand for video streaming services and content. Video streaming providers are facing difficulties meeting this growing demand with increasing resource requirements for increasingly heterogeneous environments. For example, in HTTP Adaptive Streaming (HAS) the server maintains multiple versions (i.e., representations in MPEG DASH) of the same content split into segments of a given duration (i.e., 1-10 s) which can be individually requested by clients using a manifest (i.e., MPD in MPEG DASH) and based on its context conditions (e.g., network capabilities/conditions and client characteristics). Consequently, a content delivery network (CDN) is responsible for distributing all segments (or subsets thereof) within the network towards the clients. Typically, this results in a large amount of data being distributed within the network (i.e., from the source towards the clients).
Conventional approaches to mitigating the problem focus on caching efficiency, on-the-fly transcoding, and other solutions that typically require trade-offs among various cost parameters, such as storage, computation and bandwidth. On-the-fly transcoding approaches are computationally intensive and time-consuming, imposing significant operational costs on service providers. On the other hand, pre-transcoding approaches typically store all bitrates to meet all user types of user requests, which incurs high storage overhead, even for videos and video segments that are rarely requested.
Thus, a solution for lightweight transcoding of video at edge nodes is desirable.
The present disclosure provides for techniques relating to lightweight transcoding of video at edge nodes. A distributed computing system for lightweight transcoding may include: an origin server having a first memory, and a first processor configured to execute instructions stored in the first memory to: receive an input video comprising a bitstream, encode the bitstream into n representations, and generate encoding metadata for n−1 representations; and an edge node having a second memory, and a second processor configured to execute instructions stored in the second memory to: fetch a representation of the n representations and the encoding metadata from the origin server, transcode the bitstream, and serve one of the n representations to a client. In some examples, the n representations correspond to a full bitrate ladder. In some examples, the first processor is further configured to execute instructions stored in the first memory to compress the encoding metadata. In some examples, the encoding metadata comprises a partitioning structure of a coding tree unit. In some examples, the encoding metadata results from an encoding of the bitstream. In some examples, the representation corresponds to a highest bitrate, and the encoding metadata corresponds to other bitrates. In some examples, the second processor is configured to transcode the bitstream using a transcoding system. In some examples, the transcoding system comprises a decoding module and an encoding module.
A method for lightweight transcoding may include: receiving, by a server, an input video comprising a bitstream; encoding, by the server, the bitstream into n representations; generating metadata for n−1 representations; and providing to an edge node a representation of the n representations and the metadata, wherein the edge node is configured to transcode the bitstream into the n−1 representations using the metadata. In some examples, the n representations correspond to a full bitrate ladder. In some examples, the representation comprises a highest quality representation corresponding to a highest bitrate. In some examples, the representation comprises an intermediate quality representation corresponding to an intermediate bitrate. In some examples, generating the metadata comprises storing an optimal search result from the encoding as part of the metadata. In some examples, generating the metadata comprises storing an optimal decision from the encoding as part of the metadata. In some examples, the method also may include compressing the metadata. In some examples, the representation comprises a subset of the n representations.
A method for lightweight transcoding may include: fetching, by an edge node from an origin server, a representation of a video segment and metadata associated with a plurality of representations of the video segment, the origin server configured to encode a bitstream into the plurality of representations and to generate the metadata; transcoding the bitstream into the plurality of representations using the representation and the metadata; and serving one or more of the plurality of representations to a client in response to a client request. In some examples, the method also may include determining, according to an optimization model, whether the representation of the video segment should comprise one of the plurality of representations or all of the plurality of representations. In some examples, the optimization model comprises an optimal boundary point between a first set of segments for which one of the plurality of representations should be fetched and a second set of segments for which all of the plurality of representations should be fetched, the determining based on whether the video segment is in the first set of segments or the second set of segments. In some examples, the method also may include determining the optimal boundary point using a heuristic algorithm.
Various non-limiting and non-exhaustive aspects and features of the present disclosure are described hereinbelow with references to the drawings, wherein:
Like reference numbers and designations in the various drawings indicate like elements. Skilled artisans will appreciate that elements in the Figures are illustrated for simplicity and clarity, and have not necessarily been drawn to scale, for example, with the dimensions of some of the elements in the figures exaggerated relative to other elements to help to improve understanding of various embodiments. Common, well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments.
The Figures and the following description describe certain embodiments by way of illustration only. One of ordinary skill in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein. Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures.
The above and other needs are met by the disclosed methods, a non-transitory computer-readable storage medium storing executable code, and systems for lightweight transcoding on edge nodes.
The invention is directed to a lightweight transcoding system and methods of lightweight transcoding at edge nodes. In order to serve the demands of heterogeneous environments and mitigate network bandwidth fluctuations, it is important to provide streaming services (e.g., video-on-demand (VoD)) with different quality levels. In video delivery (e.g., using HTTP Adaptive Streaming (HAS)), a video source may be divided into parts or intervals known as video segments. Each segment may be encoded at various bitrates resulting in a set of representations (i.e., a representation for each bitrate). Storing optimal search results and decisions of an encoding performed by an origin server, and saving such optimal results and decisions as metadata to be used in on-the-fly transcoding, allow for edge nodes (e.g., servers, interfaces, or any other resource between an origin server and a client) to be leveraged in order to reduce the amount of data to be distributed within the network (i.e., from the source towards the clients). There is no additional computation cost to extracting the metadata because the metadata is extracted during the encoding process in an origin server (i.e., part of a multi-bitrate video preparation that the origin server would perform in any encoding process). Edge nodes as used herein may refer to any edge device with sufficient compute capacity (e.g., multi-access edge computing (MEC)).
During encoding of video segments at origin servers, computationally intensive search processes are employed. Optimal results of said search processes may be stored as metadata for each video bitrate. In some examples, only the highest bitrate representation is kept, and all other bitrates in a set of representations are replaced with corresponding metadata (e.g., for unpopular videos). The generated metadata is very small (i.e., a small amount of data) compared to its corresponding encoded video segment. This results in a significant reduction in bandwidth and storage consumption, and decreased time for on-the-fly transcoding (i.e., at an edge node) of requested segments of videos using said corresponding metadata, rather than unnecessary search processes (i.e., at the edge node).
Example Systems
Each of servers 102 and 112 and edge nodes 104 and 114a-n may comprise at least a memory or other storage (not shown) configured to store video data, encoded data, metadata, and other data and instructions (e.g., in a database, an application, data store, or other format) for performing any of the features and steps described herein. Each of servers 102 and 112 and edge nodes 104 and 114a-n also may comprise a processor configured to execute instructions stored in a memory to carry out steps described herein. A memory may include any non-transitory computer-readable storage medium for storing data and/or software that is executable by a processor, and/or any other medium which may be used to store information that may be accessed by a processor to control the operation of a computing device (e.g., servers 102 and 112, edge nodes 104 and 114a-n, clients 106 and 116a-n). In other examples, servers 102 and 112 and edge nodes 104 and 114a-n may comprise, or be configured to access, data and instructions stored in other storage devices (e.g., storage 108 and 118). In some examples, storage 108 and 118 may comprise cloud storage, or otherwise be accessible through a network, configured to deliver media content (e.g., one or more of the n representations) to clients 106 and 116a-n, respectively. In other examples, edge node 104 and/or edge nodes 114a-n may be configured to deliver said media content to clients 106 and/or clients 116a-n directly or through other networks.
In some examples, one or more of servers 102 and 112 and edge nodes 104 and 114a-n may comprise an encoding-transcoding system, including hardware and software. The encoding-transcoding system may comprise a decoding module and an encoding module, the decoding module configured to decode an input video (i.e., video segment) from a format into a set of video data frames, the encoding module configured to encode video data frames into a video based on a video format. The encoding-transcoding system also may analyze an output video to extract encoding statistics, determine optimized encoding parameters for encoding a set of video data frames into an output video based on extracted encoding statistics, decode intermediate video into another set of video data frames, and encode the other set of video data frames into an output video based on the desired format and optimized encoding parameters. In some examples, the encoding-transcoding system may be a cloud-based encoding system available via computer networks, such as the Internet, a virtual private network, or the like. The encoding-transcoding system and any of its components may be hosted by a third party or kept within the premises of an encoding enterprise, such as a publisher, video streaming service (e.g., video-on-demand (VoD)), or the like. The system may be a distributed system, and it may also be implemented in a single server system, multi-core server system, virtual server system, multi-blade system, data center, or the like.
In some examples, outputs (e.g., representations, metadata, other video content data) from edge nodes 104 and 114a-n may be stored in storage 108 and 118, respectively. Storage 108 and 118 may make encoded content (e.g., the outputs) available via a network, such as the Internet. Delivery may include publication or release for streaming or download. In some examples, multiple unicast connections may be used to stream video (e.g., real-time) to a plurality of clients (e.g., clients 106 and 116a-n). In other examples, multicast-ABR may be used to deliver one or more requested qualities (i.e., per client requests) through multicast trees. In still other examples, only the highest requested quality representation is sent to an edge node, such as a virtual transcoding function (VTF) node (e.g., in context of a software defined network (SDN) and/or network function virtualization (NFV)), via a multicast tree as shown in
In
In an example of the present invention, in network 310 shown in
In another example of the present invention, in network 320 shown in
In some examples, transcoding options for edge nodes 104 and 114a-n may be optimized, towards clients 106 and 116a-n, respectively, for example according to a subset of a bitrate ladder according to requests from clients 106 and 116a-n. Other variations may include, but are not limited to, (i) one or more of edge nodes 104 and 114a-n may transcode to a different bitrate ladder depending on client context (e.g., for one or more of clients 106 and 116a-n), (ii) a scheme may be integrated with caching strategies on one or more of edge nodes 104 and 114a-n, (iii) real-time encoding may be implemented on one or more of edge nodes 104 and 114a-n depending on client context (e.g., for one or more of clients 106 and 116a-n), and combinations of (i)-(iii). Additionally, the encoding metadata (e.g., generated by servers 102 and/or 112) may be compressed to reduce overhead, for example, with the same coding tools as used when encoded as part of the video.
Partitioning structure 200 may be an example of an optimal partitioning structure (e.g., determined through an exhaustive search using a brute-force method as used by a reference software). An origin server (e.g., servers 102 and 112) may calculate a plurality of RD costs to generate optimal partitioning structure 200, which may be encoded and sent as metadata to an edge node (e.g., edge nodes 104 and 114a-n, edge servers X1-X3). An edge node may extract an optimal partitioning structure for a CTU (e.g., structure 200) from the metadata provided by an origin server and use it to avoid requiring a brute force search process (e.g., searching unnecessary partitioning structures). An origin server also may further calculate and extract prediction unit (PU) modes (i.e., an optimal PU partitioning mode may be the PU structure with the minimum cost), motion vectors, selected reference frames, and other data relating to a video input, to be included in the metadata to reduce burden on edge calculations. An origin server may be configured to determine which of n representations may be sent to an edge node (e.g., highest bitrate/resolution, intermediate or lower) for transcoding.
Example Methods
At step 404, a highest quality representation (e.g., highest bitrate, such as 4K or 8K) of the n representations and the metadata may be provided to (i.e., fetched by) an edge node (e.g., edge nodes 104 and 114a-n, edge servers X1-X3). In some examples, an edge node may employ an optimization model to determine whether a segment should be fetched with only the highest quality representation and metadata generated during encoding (i.e., corresponding to n−1 representations). In other examples, said optimization model may indicate that a segment should be downloaded from an origin server in more than one, or all, bitrate versions (e.g., more than one or all of n representations). For example, the optimization model may consider the popularity of a video or video segment in determining whether to fetch more than one, or all, of the n representations for said video or video segment. Since a small percentage of video content that is available is requested frequently, and often, for any requested video, only a portion of the video is viewed often (e.g., a beginning portion or a popular highlight), the majority of video segments may be fetched with one representation and the metadata, saving bandwidth and storage.
In some examples, the optimization model may consider aspects of a client request received from one or more clients (e.g., clients 106 and 116a-n). At the edge, the bitstream may be transcoded according to the metadata and one or both of a context condition and content delivery network (CDN) distribution policy at step 405. In some examples, transcoding may be performed in real time in response to the client request. In some examples, the CDN distribution policy may include a caching policy for both live and on-demand streaming, and other DVR-based functions. In other examples, no caching is performed. In some examples, the edge node may transcode the bitstream into the n−1 representations using the highest quality representation and the metadata. One or more of the n representations may be served (i.e., delivered) from the edge node to a client in response to a client request at step 406.
In some examples, an optimization model may indicate an optimal boundary point between a first set of segments that should be stored at a highest quality representation (i.e., highest bitrate) and a second set of segments that should be kept at a plurality of representations (i.e., plurality of bitrates). The optimal boundary point may be selected based on a request rate (R) during a time slot and as a function of a popularity distribution applied over an array (X) of video segments (ρ), such that a total cost of transcoding (i.e., computational overhead, including time) and storage is minimized. For any integer value x (1≤x≤ρ) as the candidate optimal boundary point, a storage cost may be:
Costst(x)=(x×h+(ρ−x)×f)×δ [Eq. 1]
where h denotes a size of the one or more segments stored at a highest bitrate plus the metadata for the one or more segments, f denotes a size of the one or more segments stored in all representations, and δ denotes a cost of storage in each time slot T with duration of 0 seconds. Thus, for any integer value x (1≤x≤ρ), the transcoding cost may be:
Costtr(x)=P(x)×R×β [Eq. 2]
where R denotes a number of arrived requests at the server in each time slot T and β denotes a computation cost for transcoding. Thus, the optimal boundary point (BP) for the given request arrival rate R and cumulative popularity function P(x) can be obtained by:
An optimal boundary point may be determined by differentiating a total cost function (Costst(x)+Costtr(x)) with respect to x and equaling to zero. In some examples, a heuristic algorithm may be used to evaluate candidates (e.g., a last segment) for optimal boundary points (bestX). An example heuristic algorithm may comprise:
In lines 1-5, the heuristic algorithm considers the last segment as a candidate for (bestX) and calls CostFunc function to calculate Costst+Costtr for bestX and its adjacent segments. In the while loop (lines 7-12), the step and direction of the search process in the next iteration are determined. In case the cost of bestX is less than its adjacent segments (line 13) or the conditions in the if statement in line 16 are satisfied, the search process is finished and bestX is returned as the optimal boundary point (lines 13-23).
In an alternative embodiment, an intermediate quality representation (e.g., intermediate bitrate, such as 1080p or 4K) of the n representations may be provided (i.e., fetched) with the metadata, instead of a highest quality representation, at step 404. Upscaling may then be performed at the edge or the client (e.g., with or without usage of super-resolution techniques taking into account encoding metadata). In yet another alternative embodiment, all of the n representations are provided for a subset of segments (e.g., segments of a popular video, most played segments of a video, the beginning segment of each video) along with one representation (e.g., highest quality, intermediate quality, or other) and the metadata for other segments to enable lightweight transcoding at an edge node.
Advantages of the invention described herein include: (1) significant reduction of CDN traffic between (origin) server and edge node, as only one representation and encoding metadata is delivered instead of representations corresponding to the full bitrate ladder; (2) significant reduction of transcoding time and other transcoding costs at the edge due to the available encoding metadata, which offloads some or all complex encoding decisions to the server (i.e., origin server); (3) storage reduction at the edge due to maintaining metadata, rather than representations for a full bitrate ladder, at the edge (i.e., on-the-fly transcoding at the edge in response to client requests), which may result in better cache utilization and also better Quality of Experience (QoE) towards the end user eliminating quality oscillations.
In other examples, existing, optimized multi-rate/-resolution techniques may be used with this technique to reduce encoding efforts on the server (i.e., origin server). An edge node also may transcode to a different set of representations than the n representations encoded at an origin server (e.g., according to a different bitrate ladder), depending on needs and/or requirements from a client request, or other external requirements and configurations. In still other examples, representations and metadata may be transported from an origin server to an edge node within the CDN using different transport options (e.g., multicast-ABR, WebRTC-based transport), for example, to improve latency.
Although the invention has been described with reference to certain specific embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the spirit and scope of the invention as outlined in the claims appended hereto. The entire disclosures of all references recited above are incorporated herein by reference.
This application claims the benefit of U.S. Provisional Patent Application No. 63/108,244, filed Oct. 30, 2020, and titled “Lightweight Transcoding on Edge Servers,” which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63108244 | Oct 2020 | US |