1. Field
The current disclosure relates to audio-visual (AV) content delivery, and more specifically, but not exclusively, to AV content delivery using a packet-switched network.
2. Description of the Related Art
Audio-visual (AV) content, in the form of digital data, may be delivered to consumers using content delivery networks (CDNs). A typical CDN uses a packet-switched network, such as the Internet, to deliver encoded AV content from an origin server to a large group of user devices via a collection of edge servers, each of which is typically located proximately to a subgroup of the user devices. User devices include players configured to decode received digital audio-visual content. A typical CDN comprises an origin server and a plurality of edge servers. Each edge server is connected to one or more different types of end-user devices, such as, for example, set-top boxes (STBs), desktops, laptops, tablets, smart phones, and cellular phones.
AV content items, also known as assets, are created in an original high-resolution format—such as, for example, AVCHD (Advanced Video Coding High Definition) or AVC-Intra—that typically stores information about each of the frames in the asset with only limited, if any, compression, in order to preserve a maximal amount of information. The original-format asset may be further compressed to generate a high-resolution mezzanine lossy intermediate file of a smaller size than the original-format asset. Common standards for additional compression are ProRes 422, JPEG2000, and long GOP MPEG-2 (H.222/H.262 as defined by the ITU).
The mezzanine lossy intermediate file is then commonly compressed again for end user devices. A common final compression format is part 10 of the MPEG-4 suite, known as H.264 or AVC (advanced video coding). H.264 defines an improved compression process that, compared with older standards, allows for (i) a higher-quality AV segment at the same bitrate or (ii) the same quality AV segment at a lower bitrate.
The H.264 standard is computationally more intensive relative to older standards and allows for encoding a relatively more-compressed version of an AV file. The compression uses multiple techniques that are based on the way moving pictures are structured—having, for example, typically a lot of similarity between neighboring sections of a single frame and between consecutive frames—and the way moving pictures are perceived by humans—with, for example, greater sensitivity to changes in brightness than changes in color (in other words, a greater sensitivity to luminance than to chromaticity). The result of this H.264 encoding can be stored in a defined standard containers such as MP4 (MPEG-4 part 14) or MOV (QuickTime File Format; QuickTime is a registered trademark of Apple Inc. of Cupertino, Calif.). End user devices typically download the final compressed H.264 encoded video and compressed audio in chunked segments of the MP4 container. The chunked segments are defined by standards such as Smooth Streaming (using ISM files), HTTP Live Streaming (HLS), and HTTP Dynamic Streaming (HDS), and referred to in this application as transport-stream files. One common file type for transport-stream files, which are used in the user segment, is the .ts file type. A mezzanine intermediate file may be transcoded into a myriad different corresponding transport-stream files of various quality levels. As noted above, a CDN serves different types of user devices. These devices may have different types of media-playing programs on them and the devices may have data connections of different bandwidths to their respective edge servers. Consequently, a user device requesting an asset specifies a particular format, resolution, and/or bitrate (or bitrate range).
In order to be ready to provide the asset to a variety of devices, running a variety of client media-playing programs, at a variety of resolutions and bitrates, multiple versions of the asset are created. Each different version requires a corresponding transcoding of the mezzanine intermediate file into transport-stream files. The origin server stores multiple transport-stream versions of the asset, where the different versions, in the form of different transport-stream files, have different encoding formats, resolutions, and/or bitrates. Multiple versions of the same asset are needed for compatibility with a wide range of user-device types, client programs, and data-transmission rates. These different multiple versions are stored at the origin server.
When a user device requests a particular asset from its corresponding edge server, one or more compatible transport-stream files—corresponding to, for example, different bitrates—are transmitted by the origin server to the edge server for caching at the edge server. The particular versions depend on factors such as (a) the device type of the user device, (b) the user device's client program that requested the asset, and (c) the available bandwidth of the connection between the user device and the edge server. The edge server caches the entirety of the received transmission-ready version of the requested item and streams it to the user device for presentation to a user.
The above-described system requires both the generation, transport, and storage of a large number of versions of every asset that is to be readily available to users.
One embodiment of the disclosure can be a computer-implemented method comprising storing a set of preprocessed video chunks at an edge server of a content delivery network (CDN), wherein the preprocessed video chunks result from partial encoding of corresponding segments of a video asset, the partial encoding does not include quantization processing, and the edge server connects to a user device. The method further comprises receiving, by the edge server, a request from the user device for a segment of the video asset at a first target quality level, then retrieving, by the edge server, from the set of preprocessed video chunks, at least one preprocessed video chunk corresponding to the requested segment. The method further comprises then performing, by the edge server, final processing of the at least one preprocessed video chunk, wherein the final processing completes the encoding of the at least one preprocessed video chunk and includes quantization processing, and the final processing generates a corresponding first transport-stream file at the first target quality level. The method further comprises then providing, by the edge server, the first transport-stream file to the user device.
Another embodiment of the disclosure can be an edge server comprising a processor and a memory, the edge server configured to (i) operate as part of a content delivery network (CDN) and (ii) connect to a set of one or more user devices including a first user device, wherein the processor is configured to store a set of preprocessed video chunks into the memory, wherein the preprocessed video chunks result from partial encoding of corresponding segments of a video asset and the partial encoding does not include quantization processing. The processor is further configured to receive a request from the first user device for a segment of the video asset at a target quality level, then retrieve from the memory at least one preprocessed video chunk corresponding to the requested segment from the set of preprocessed video chunks, and then perform final processing of the at least one preprocessed video chunk, wherein the final processing completes the encoding of the at least one preprocessed video chunk and includes quantization processing, and the final processing generates a corresponding first transport-stream file at the target quality level. The processor is further configured to then provide the first transport-stream file to the first user device.
Yet another embodiment of the disclosure can be an origin server configured to (a) operate as part of a content delivery network (CDN), (b) connect to a plurality of edge servers including a first edge server, (c) preprocess a video asset, wherein the preprocessing comprises partial encoding of segments of the video asset that generates corresponding preprocessed video chunks and the partial encoding does not include quantization processing, (d) store the preprocessed video chunks, and (e) provide one or more preprocessed video chunks to the first edge server.
Other aspects, features, and advantages of the disclosure will become more fully apparent from the following detailed description, the appended claims, and the accompanying drawings in which like reference numerals identify similar or identical elements.
A mezzanine intermediate file may be organized as a series of consecutive fixed-duration segments known as chunks. Note that consecutive chunks may have different durations. Chunks are typically between one and ten seconds in duration. Chunks are typically the units used in requests for, and transmission of, the asset. Each chunk typically comprises one or more Groups of Pictures (GoPs), where a GoP may be decoded independently of other GoPs. A typical GoP starts with an initial reference frame, and encodes a plurality of subsequent frames as differences from previous and/or subsequent frames. Note that, since many times a chunk consists of a single GoP, the terms chunk and GoP may sometimes be used interchangeably.
Novel CDN systems, described below, involve pre-processing chunks of mezzanine files for more-efficient CDN operation. The pre-processed chunks may be considered partially encoded transport-stream files. The pre-processed chunks allow for on the fly, just in time, generation of corresponding transport-stream files by an edge server in response to requests by connected user devices.
As described above, a user device requests a chunk of an asset—at a particular resolution and quality level, where quality level generally corresponds to bitrate—from its corresponding edge server and, in turn, the edge server responds with a corresponding transport-stream file. Generally, transport streams of higher bit rates are used with devices having higher-resolution screens and/or faster data connections. Below are descriptions of exemplary operations of CDN 100 in the delivery of an asset in response to requests by user device 103 for an asset that is stored at origin server 101.
In accordance with one embodiment of the disclosure, a pre-processed mezzanine file is used for improved subsequent conversion into transport-stream packets. Pre-processed mezzanine files, which may be considered partially encoded transport-stream files, are prepared ahead of time and stored at origin server 101. If a user requests a particular asset, then the pre-processed and segmented mezzanine version of the asset is stored at its corresponding edge server. Note that the mezzanine file may already be cached at the edge server or the mezzanine file may be copied from the origin server, in response to the user request. Note that the edge server may pipeline an asset, where the edge server downloads subsequent chunks of the mezzanine-file version of an asset from the origin server while simultaneously transmitting earlier chunks of the transmission-stream-file version of that asset to the user device.
The user device may make requests for content one chunk at a time. The piecemeal requesting allows the user device to adjust the quality of the requested chunk based on changing bandwidth availability or other environmental variables. In addition, the piecemeal requesting allows the user device to more-easily provide trick play functionality, such as seek play (i.e., moving to any point of the asset for playback).
When instructed, the edge server performs just-in-time (JIT) media encoding of a pre-processed mezzanine-file chunk to generate a corresponding transport-stream chunk in a requested format at a requested quality level, where the chunk also includes any corresponding audio and auxiliary data (e.g., closed captioning). Typically used quality levels include 300, 600, 1200, 2000, and 3000 kbps. The edge server then generates one or more transport-stream chunks for provision to the user device via a packet-switched network, such as the Internet.
For frames of a GoP, the processing may include dividing each frame into macro blocks and using motion estimation, motion compensation, and discrete cosine transforms (DCT) to create a compressed chunk that can be used to recreate the corresponding original-format multi-frame chunk (if using a lossless compression algorithm) or a close approximation (if using a lossy compression algorithm). Audio and auxiliary data—such as closed captioning—may also be compressed and may be stored in separate corresponding segments or may be integrated with the video data. The preprocessing performs some of the steps of generating a transport-stream file, but not including the quantization processing that determines the quality level of the resultant transport-stream file. In other words, the preprocessing generates partially encoded transport stream chunks that include the set of un-quantized coefficients that are the output of the DCT step.
Quantization processing quantizes the set of coefficients from the DCT step in order to increase the number of zero coefficients in the set. The degree of quantization is determined by a quantization parameter (QP), where the higher the QP the higher the compression and the lower the playback quality. As a result, the QP determines the size of the output bit stream and the quality level of the chunk during playback.
The above-described pre-processing (or partial encoding) means that when a request is received from a user device for a particular chunk of the asset at a particular target bitrate (step 302), the generation of a corresponding transport-stream file at the requested bitrate can be made rapidly on the fly, just in time (JIT) since much of the computational processing for generating an appropriate transport-stream file has been done ahead of time.
The generation of the corresponding transport-stream file by edge server 102 includes retrieving the requested mezzanine-file chunk, which has been pre-processed as described above, and performing quantization processing on the chunk to generate a corresponding output file at the requested target bitrate (step 303). The corresponding audio (either compressed or passed through) and any auxiliary data (e.g., closed captions) are added to the encoded chunk to generate the corresponding transport-stream file (step 304). Note that the audio and auxiliary data may be stored in segments corresponding to, for example, GoPs or chunks of the video data. The transport-stream file is then transmitted to the requesting user device using one or more transport-stream packets (step 305). The procedure then returns to await another request from the user device (step 302). The procedure may be terminated or suspended (not shown) at any point by either the edge server or the user device. Note that steps 302-305 may be implemented to, for example, transmit (i) an entire asset to one viewer at time-varying levels of quality, (ii) the same entire asset to a second user at a different time and at different time-varying levels of quality, and (iii) only portions of the asset to a third user at different again time-varying levels of quality.
Edge server 102 of
In one embodiment of the disclosure, the m cores of processor 201 of edge server 102 of
In one embodiment of the disclosure, the m cores of processor 201 of edge server 102 may be used to simultaneously generate a plurality of transport-stream files at a corresponding plurality of different quality levels for one or more different chunks of a single asset. So, for example, a 100-core processor could generate, in parallel, 10 corresponding transport-stream files at ten different quality levels for 10 different chunks of a segmented mezzanine file. The processing may include the addition of tag information to the transport-stream files.
In one embodiment of the disclosure, processor 201 of edge server 102 of
In one embodiment, edge server 102 of
At any time during this process, edge server 102 may receive another request for another chunk of an asset at a particular bitrate (step 505)—such as, for example, the next consecutive chunk of the asset—and edge server 102 may respond by immediately locating and starting to process the newly requested chunk at the requested bitrate (step 503)—all while edge server 102 may be transmitting or still processing the previous chunk (steps 503 and 504). If no new requests for chunks are received (step 505), then the process terminates (step 506). Note that a user device may request a next consecutive chunk of an asset at a bitrate lower than, the same as, or higher than, the immediately prior chunk, as described above.
It should be noted that the processing of a chunk may be done faster than real time. In other words, a chunk whose duration is t seconds may be processed in y seconds, where y<t. For example, edge server 102 may generate a 90-second transport-stream file from a 90-second mezzanine-file chunk in 60 seconds. This allows for continuous pipeline processing and streaming of content to the user device in response to requests without interruptions, pauses, stutters, or other similar annoyances to the user.
In one embodiment, features of flowcharts 400 of
In one embodiment of the disclosure, a user device may request a chunk at any bitrate, and edge server 102 of
In one embodiment, which may be combined with any of the above-described embodiments, edge server 102 of
It should be noted that, unless otherwise indicated or impossible, features of the above-described embodiments and implementations may be combined with one or more features of one or more other of the above-described embodiments and implementations.
Embodiments of the disclosure have been described referring to corresponding chunks. Note that chunks of an asset may be identified for the purpose of determining correspondence by for example, a sequential number identifier or a timecode.
References herein to the verb “to set” and its variations in reference to values of fields do not necessarily require an active step and may include leaving a field value unchanged if its previous value is the desired value. Setting a value may nevertheless include performing an active step even if the previous or default value is the desired value.
Unless indicated otherwise, the term “determine” and its variants as used herein refer to obtaining a value through measurement and, if necessary, transformation. For example, to determine an electrical-current value, one may measure a voltage across a current-sense resistor, and then multiply the measured voltage by an appropriate value to obtain the electrical-current value. If the voltage passes through a voltage divider or other voltage-modifying components, then appropriate transformations can be made to the measured voltage to account for the voltage modifications of such components and to obtain the corresponding electrical-current value.
As used herein in reference to data transfers between entities in the same device, and unless otherwise specified, the terms “receive” and its variants can refer to receipt of the actual data, or the receipt of one or more pointers to the actual data, wherein the receiving entity can access the actual data using the one or more pointers.
Exemplary embodiments have been described wherein particular entities (a.k.a. modules) perform particular functions. However, the particular functions may be performed by any suitable entity and are not restricted to being performed by the particular entities named in the exemplary embodiments.
Exemplary embodiments have been described with data flows between entities in particular directions. Such data flows do not preclude data flows in the reverse direction on the same path or on alternative paths that have not been shown or described. Paths that have been drawn as bidirectional do not have to be used to pass data in both directions.
As used herein, the term “cache” and its variants refer to a dynamic computer memory that is preferably (i) high-speed and (ii) configured to have its present contents repeatedly overwritten with new data. To cache particular data, an entity can have a copy of that data stored in a determined location, or the entity can be made aware of the memory location where a copy of that data is already stored. Freeing a section of cached memory allows that section to be overwritten, making that section available for subsequent writing, but does not require erasing or changing the contents of that section.
References herein to the verb “to generate” and its variants in reference to information or data do not necessarily require the creation and/or storage of new instances of that information. The generation of information could be accomplished by identifying an accessible location of that information. The generation of information could also be accomplished by having an algorithm for obtaining that information from accessible other information.
As used herein in reference to an element and a standard, the term “compatible” means that the element communicates with other elements in a manner wholly or partially specified by the standard, and would be recognized by other elements as sufficiently capable of communicating with the other elements in the manner specified by the standard. The compatible element does not need to operate internally in a manner specified by the standard.
The term “nonvolatile memory,” as used herein, refers to any type of memory that substantially retains its stored contents after disconnection from its power supply, i.e., the stored contents can be retrieved after reconnecting the nonvolatile memory to a power supply. Examples of nonvolatile memory include, but are not necessarily limited to (i) fuse/antifuse devices such as OTP memory and PROM, (ii) charge-storing devices such as EPROM and EEPROM and flash ROM, (iii) magnetic media devices such as hard drives and tapes, and (iv) optical, opto-electrical, and opto-magnetic media such as CDs and DVDs.
The present invention may be implemented as circuit-based systems, including possible implementation as a single integrated circuit (such as an ASIC or an FPGA), a multi-chip module, a single card, or a multi-card circuit pack. As would be apparent to one skilled in the art, various functions of circuit elements may also be implemented as processing steps in a software program. Such software may be employed in, for example, a digital signal processor, micro-controller, or general-purpose computer.
The present invention can be embodied in the form of methods and apparatuses for practicing those methods. The present invention can also be embodied in the form of program code embodied in tangible media, such as magnetic recording media, optical recording media, solid state memory, floppy diskettes, CD-ROMs, hard drives, or any other non-transitory machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of program code, for example, stored in a non-transitory machine-readable storage medium including being loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a unique device that operates analogously to specific logic circuits.
The present invention can also be embodied in the form of a bitstream or other sequence of signal values stored in a non-transitory recording medium generated using a method and/or an apparatus of the present invention.
It will be further understood that various changes in the details, materials, and arrangements of the parts which have been described and illustrated in order to explain the nature of this invention may be made by those skilled in the art without departing from the scope of the invention as expressed in the following claims.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments necessarily mutually exclusive of other embodiments. The same applies to the term “implementation.”
Unless explicitly stated otherwise, each numerical value and range should be interpreted as being approximate as if the word “about” or “approximately” preceded the value of the value or range. As used in this application, unless otherwise explicitly indicated, the term “connected” is intended to cover both direct and indirect connections between elements.
For purposes of this description, the terms “couple,” “coupling,” “coupled,” “connect,” “connecting,” or “connected” refer to any manner known in the art or later developed in which energy is allowed to be transferred between two or more elements, and the interposition of one or more additional elements is contemplated, although not required. The terms “directly coupled,” “directly connected,” etc., imply that the connected elements are either contiguous or connected via a conductor for the transferred energy.
The use of figure numbers and/or figure reference labels in the claims is intended to identify one or more possible embodiments of the claimed subject matter in order to facilitate the interpretation of the claims. Such use is not to be construed as limiting the scope of those claims to the embodiments shown in the corresponding figures.
The embodiments covered by the claims in this application are limited to embodiments that (1) are enabled by this specification and (2) correspond to statutory subject matter. Non-enabled embodiments and embodiments that correspond to non-statutory subject matter are explicitly disclaimed even if they fall within the scope of the claims.
Although the steps in the following method claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those steps, those steps are not necessarily intended to be limited to being implemented in that particular sequence.
This application claims the benefit of the filing date of U.S. Provisional Application No. 61/648,828 filed on May 18, 2012, the teachings of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61648828 | May 2012 | US |