The present disclosure relates generally to communication networks, and more particularly, to compression systems.
Compression is an important component of many digital systems. Compression systems may be used to compress video, audio, or other data. There are a number of coding standards, including, for example, ITU-T H.262, H.263, and H.264. The newer standards compress video more efficiently than previous standards, however, this increased compression efficiency comes at the cost of additional computation requirements.
Corresponding reference characters indicate corresponding parts throughout the several views of the drawings.
In one embodiment, a method generally comprises receiving data for compression at a first network device comprising an initial processing portion of a compression system, performing one or more processes to prepare the data for entropy encoding, compacting said data, and transmitting the compacted data to a second network device comprising an entropy encoding portion of the compression system. The first and second network devices comprise independent processors.
In another embodiment, an apparatus generally comprises a processor for interpreting compacted data received from an initial processing portion of a compression system, entropy encoding the data, and transmitting a compressed bit stream. The apparatus further includes memory for storing encoding information. The processor is independent from the initial processing portion of the compression system.
In yet another embodiment, a compression system generally comprises an initial processing portion for processing received data to prepare the data for entropy encoding and compacting the data utilizing fixed length encoding, and an entropy encoding portion for interpreting the data received from the initial processing portion and performing entropy encoding. Compaction of the data reduces transmission bandwidth between the initial processing portion and the entropy encoding portion.
The following description is presented to enable one of ordinary skill in the art to make and use the embodiments. Descriptions of specific embodiments and applications are provided only as examples, and various modifications will be readily apparent to those skilled in the art. The general principles described herein may be applied to other applications without departing from the scope of the embodiments. Thus, the embodiments are not to be limited to those shown, but are to be accorded the widest scope consistent with the principles and features described herein. For purpose of clarity, details relating to technical material that is known in the technical fields related to the embodiments have not been described in detail.
Processing for video compression systems typically includes pixel domain redundancy removal (motion estimation or intra-prediction) followed by transformation, quantization, and entropy coding of syntax elements. Motion estimation, transformation, and quantization are often amenable to parallel processing implementations. Entropy coding is typically very specific to a particular encoding format and not suitable to parallel processing implementations. Furthermore, entropy coding is often computationally expensive with operations that are highly ‘irregular’ from a hardware point of view.
In conventional systems, the data output from a quantization module and input to an entropy coding module is uncompressed and associated with high bandwidth requirements. This typically necessitates implementation of the complete encoding pipeline on the same physical processor since the transmission of raw pixel or transform data between different modules would be prohibitively expensive in terms of bandwidth requirements.
The embodiments described herein provide an efficient multi-processor implementation for compression systems that allows entropy coding to be implemented separately from other processing. In one embodiment, motion estimation, transformation, and quantization, which are amenable to parallel processing arrangements, are implemented separately from entropy encoding. The embodiments provide for compression (referred to herein as compaction) of data output from an initial processing portion of the encoding pipeline and input to an entropy encoding portion. This architecture allows for remote location of an entropy coding module from the rest of the encoding pipeline and enables realization of new encoding architectures. The embodiments may be used to compress any type of data, including, for example, audio, video, or both audio and video. The embodiments enable efficient communication between the initial processing portion and the entropy encoding portion of the encoding pipeline.
Referring now to the drawings, and first to
The data center 10 may be an Ethernet network, Fibre Channel (FC) network, Fibre Channel over Ethernet (FCoE) network, or any other type of network. The data center 10 may include any number of servers, switches, storage devices, or other network devices or systems (e.g., video content delivery system).
The network 12 may include one or more networks (e.g., local area network, metropolitan area network, wide area network, virtual private network, enterprise network, Internet, intranet, radio access network, public switched network, or any other network). The network 12 may include any number or type of network devices (e.g., routers, switches, gateways, or other network devices), which facilitate passage of data over the network. The network 12 may also be in communication with one or more other networks, hosts, or users. The networks 10, 12 are connected via communication links. The networks 10, 12 may operate in a cloud computing environment.
In the example shown in
As described in the detail below, a communication protocol between the initial processing portion 18 and the entropy encoding portion 19 provides an efficient trade-off between the communication bandwidth and the complexity associated with the protocol. The data output from the initial processing portion 18 is compacted so that compressed data is transmitted from the initial processing portion 18 to the entropy encoding portion 19. Since the compacted data results in lower bandwidth requirements, entropy encoding may be performed remote from the rest of the processing performed by the compression system. For example, the entropy encoding portion 19 may be located at a separate network (e.g., different data center 10 as shown in
In one embodiment, the compression system is configured for hybrid GPU (graphics processing unit)—CPU (central processing unit) implementation wherein the entropy encoding is implemented on a CPU and the other processing (e.g., motion estimation, transformation, and quantization) is implemented on ‘parallel-friendly’ GPU hardware. In one example, a data center service provider may house both CPUs and GPUs. The initial processing portion 18 of the encoding pipeline may be implemented on a GPU farm and the compacted output data from the GPU farm transmitted to a CPU farm for entropy encoding. Compaction of the data transmitted from the initial processing portion 18 to the entropy encoding portion 19 allows for each portion of the compression system to operate using independent processors.
It is to be understood that the network shown in
The network device 20 includes a processor 22, memory 24, interface 26, and compression system modules 28 (e.g., motion estimation, transformation, quantization, and compaction for the initial processing portion 18, or interpretation and entropy encoding for the entropy encoding portion 19). Memory 24 may be a volatile memory or non-volatile storage, which stores various applications, modules, and data for execution and use by the processor 22. Memory may also include encoding information (e.g., syntax elements, descriptors, values for syntax elements and information needed to encode them, state of independent syntax elements).
Logic may be encoded in one or more tangible computer readable media for execution by the processor 22. For example, the processor 22 may execute codes stored in a computer-readable medium such as memory 24. The computer-readable medium may be, for example, electronic (e.g., RAM (random access memory), ROM (read-only memory), EPROM (erasable programmable read-only memory)), magnetic, optical (e.g., CD, DVD), electromagnetic, semiconductor technology, or any other suitable medium.
The interface 26 may comprise any number of interfaces (linecards, ports) for receiving signals or data or transmitting signals or data to other devices. The interface 26 may include, for example, an Ethernet interface for connection to a computer or network.
It is to be understood that the network device 20 shown in
The entropy encoding portion 19 includes an interpretation module (layer) 42 and entropy coding module 44. Entropy coding is a process by which discrete-valued source symbols are represented in a manner that takes advantage of the relative probabilities of the various possible values of each source symbol. The entropy encoder 44 may use context adaptive variable length coding (CAVLC) or context adaptive binary arithmetic coding (CABAC), for example.
Multiple independent processors are employed so that the entropy coding module 44 can be implemented separately from the rest of the processing (e.g., motion estimation, transformation, and quantization).
It is to be understood that the compression system illustrated in
The following example describes encoding of a video stream into a compressed bit stream using the modules shown in
In one embodiment, rate control feedback is provided between the entropy encoding portion 19 and the initial processing portion 18. This may include, for example, various bit stream statistics such as number of bits generated from the encoding of a NAL unit by the entropy coding module 44, which are provided to the initial processing portion 18 to facilitate target bit-rate control.
It is to be understood that the processes shown in
The following describes an example of a communication protocol (interface) between the initial processing portion 18 and entropy encoding portion 19 for a compression system that encodes data to generate bit stream data that conforms to ITU-T H.264 (ITU-T H.264 Series H: Audiovisual and Multimedia Systems: Infrastructure of audiovisual services—Coding of moving video). It is to be understood that this is only an example and that the compression system may also be used to encode data according to another standard, such as H.262, H.263, H.264, or other coding standard or format.
The H.264 standard defines the syntax of an encoded video bit stream and the method of decoding the bit stream. An H.264 bit stream comprises a sequence of NAL (network abstraction layer) units. The NAL unit is a syntax structure containing an indication of the type of data to follow (in a header byte) and bytes containing payload data of the type indicated by the header. The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The embodiments provide a NAL unit based interface for communication between the initial processing portion 18 and the entropy encoding portion 19. For each NAL unit, appropriate information (i.e., values for various syntax elements and any information needed to encode them) is provided by the initial processing portion 18 to the entropy encoding portion 19.
The following example applies to the syntax description for various NAL unit payloads as they occur in an H.264 SVC (scalable video coding) bit stream. SVC is described in Annex G of the H.264 standard and enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit stream.
Within the bit stream for a typical NAL unit payload, syntax elements that occur earlier can result in the conditional presence of syntax elements that occur later, depending upon their value. The former is referred to herein as independent syntax elements and the later as dependent syntax elements. This property may be referred to as intra-NAL unit syntax element dependency.
Across various NAL unit payloads, syntax elements that are indicated in some NAL units such as seq_parameter_set_rbsp( ) (see, for example, section G.7.3.2.1.2 of H.264) and pic_parameter_set_rbsp( ) (see, for example, section G.7.3.2.2 of H.264) can result in conditional presence of syntax elements in other NAL units such as slice_layer_without_partitioning_rbsp( ) (see, for example, section G.7.3.2.8 of H.264) depending upon their value. The former are referred to herein as independent syntax elements and the latter as dependent syntax elements. This property is referred to as inter-NAL unit syntax element dependency.
Derived variables associated with independent syntax elements from either of the above scenarios may result in conditional presence of other syntax elements, depending upon their value.
The size of an encoded NAL unit in conventional systems is variable for two reasons. First, the number of syntax elements indicated in a NAL unit payload can vary for reasons discussed above. Changing of the variable size to a fixed size is referred to herein as compaction. Furthermore, the number of bits associated with the encoding of a syntax element value varies depending upon the value of the syntax element. This is referred to as entropy encoding.
Based on the above, the embodiments use the following general framework for a NAL unit payload data input to the entropy encoding portion 19.
For every NAL unit payload to be processed by the entropy encoding portion 19, the input data can be thought of as a stream of bytes (or a packet). This packet represents the values of various syntax elements in the same order and with the same set of dependencies as the corresponding encoded version of the NAL unit payload depicted in section G.7.3 of H.264. In conventional systems, packets would be variable sized and represent bit encodings using various syntax element descriptors as set forth in H.264 (e.g., ae(v) (context adaptive arithmetic entropy coded syntax element) and ce(v) (context adaptive variable length entropy coded syntax element)). In the embodiments described herein, packets transmitted from the initial processing portion 18 to the entropy encoded portion 19 contain unencoded syntax elements (i.e., syntax element values that have not been entropy encoded).
For example, using CAVLC mode of H.264, in conventional systems the syntax element coeff token is encoded using VLC (variable length coding) table lookups. However, this syntax element takes at most 68 values and can be represented in 7 bits with fixed length encoding using the compaction described herein. The packet is decodable by the interpretation layer 42 at the entropy encoding portion 19.
Due to the property of inter-NAL unit syntax element dependency, the parsing of dependent syntax elements in the packet may necessitate maintenance of some state in the entropy encoding portion 19 corresponding to the independent syntax elements. Upon parsing of the packet, the values of all syntax elements will be known to the entropy encoding portion 19 and can be used in entropy encoding.
In one embodiment, the compaction at the end of the initial processing portion 18 and the interpretation at the beginning of the entropy encoding portion 19 provide a transmission bandwidth reduction benefit, without adding a lot of implementation complexity to the compression system. The fixed length encoding based compaction and interpretation described above provide significant bandwidth savings with little increase in total computation complexity.
The communication interface described herein provides bandwidth savings for communication between the initial processing portion 18 and the entropy encoding portion 19 due to the compaction gain while transferring the actual task of entropy encoding to the entropy encoding portion. In experimental results using the reference implementation of H.264 to measure gains that arise out of compaction and entropy coding for video sequences, it was observed that compaction at the initial processing portion 18 accounts for a significant portion of the overall compression gain from the compression system.
Although the method, apparatus, and system have been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations made without departing from the scope of the embodiments. Accordingly, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.