This disclosure relates predictive video encoding. This disclosure also relates to memory and bandwidth usage during video coding.
Rapid advances in electronics and communication technologies, driven by immense customer demand, have resulted in the worldwide adoption of devices that display a wide variety of video content. Examples of such devices include smartphones, flat screen televisions, and tablet computers. Improvements in video processing techniques will continue to enhance the capabilities of these devices.
The disclosure below discusses techniques and architectures for selection among coding modes and coding strategies to support high efficiency coding. For example, a coding mode, such as a block size, prediction mode, codec selection, and/or other coding mode may be selected. Different coding modes may be associated with different complexity level for various coding operations and calculations. For example, the complexity may affect the efficiency and/or resource usage for cost metric calculations, such as a calculation of the rate distortion optimization cost metric. The architecture discussed below may select a coding strategy based on the coding mode. For example, a selected bit depth or use of the frequency domain for transform operations may be implemented. Selection of a coding mode and coding strategy may allow a balance between resource usage and coding quality.
The encoder 104 may determine the bit rate, for example, by maintaining a cumulative count of the number of bits that are used for encoding minus the number of bits that are output. While the encoder 104 may use a virtual buffer 114 to model the buffering of data prior to transmission of the encoded data 116 to the memory 108, the predetermined capacity of the virtual buffer and the output bit rate do not necessarily have to be equal to the actual capacity of any buffer in the encoder or the actual output bit rate. Further, the encoder 104 may adjust a quantization step for encoding responsive to the fullness or emptiness of the virtual buffer. An exemplary encoder 104 and operation of the encoder 104 are described below.
The memory 108 may be implemented as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), a solid state drive (SSD), hard disk, or other type of memory. The communication link 154 may be a wireless or wired connection, or combinations of wired and wireless connections. The encoder 104, decoder 106, memory 108, and display 110 may all be present in a single device (e.g. a smartphone). Alternatively, any subset of the encoder 104, decoder 106, memory 108, and display 110 may be present in a given device. For example, a streaming video playback device may include the decoder 106 and memory 108, and the display 110 may be a separate display in communication with the streaming video playback device.
In various implementations, different codecs may be used to perform coding operations, such as encoding, decoding, transcoding, and/or other coding operations. For example, codecs may include, the high efficiency video coding (HEVC), VP9 available from Google, Daala, audio video standard 2 (AVS2), and/or other codecs. Codecs may employ multiple modes which may be selected for differing coding conditions and resources.
In various implementations, a coding mode may use a particular block coding structure.
In some cases, increased block or CU sizes may increase operational complexity. For example, the resources, e.g., CPU, memory, bandwidth, cycles, used performing a transform on a large block may be greater than that used for a small block when other factors are held constant. The system may implement any number of different coding strategies for blocks of any particular size.
In some implementations, the coding logic 300 and/or coding selection logic 600, discussed below with respect to
Various cost metrics may be computed based on a weighted combination of factors. The factors used may vary greatly among implementations. Two example factors are listed below:
In some cases, factors may be correlated. For example, increasing one factor may lead to a corresponding decrease or increase in another factor (e.g., a tradeoff, complement, or other relationship). For the example factors above, coded output with less distortion may use more bits to code (e.g., a higher bit rate). In some cases, the relative importance of the rate in relation to distortion may be a function of the desired video quality. In high bit-rate and high video quality situations, the number of bits consumed may be less important than in a low-bit-rate, low-video-quality situation. In various implementations, the assigned cost for a bit may be scaled by a weight (lambda) as shown below.
RD Cost=Distortion+λ·Rate
In various implementations, other factors may be assigned a weight. For example, a weight may be assigned to distortion or a measure or motion within a group of frames. Factors used and weights assigned may vary greatly among differing implementations.
In some RDO-based implementations, the RDO calculation may be applied at individual coding stages. In some cases, the RDO calculation need not be applied for every individual coding stage. Further, based on the coding strategy selection, RDO calculations may be performed at various complexity levels. For example, for large blocks RDO may be calculated at a lower bit depth than for small blocks.
In various implementations, the coding logic 300 may determine a coding mode or available coding modes for the operations (301). In some implementations, the coding logic 300 may forward the coding mode selection and/or available coding modes to the coding selection logic 600 as discussed below (302). The coding logic 300 may receive a response indicating a coding mode and/or coding strategy from the coding selection logic 600 (303).
In some implementations, the determined coding strategy may indicate a location along a pre-calculated cost metric curve to simplify cost metric calculation. For example, the coding selection logic 600 may store a pre-calculated RDO cost curve. Additionally or alternatively, the coding selection logic 600 may calculate the RDO cost curve. The coding selection logic 600 may use mode parameters, e.g. block size and/or other parameters, as inputs to determine a position along the curve.
The coding selection logic 600 may determine whether there are multiple modes to select among (604). For example, the may determine among available block or CU sizes, color modes, prediction modes, and/or other parameters. When there are multiple available modes present, the coding selection logic 600 may select a mode (606). The mode decision may be determined based on available resources, stored parameters, comparative mode complexity (e.g., RDO cost and/or other metrics), input for external applications and/or other inputs. When multiple modes are not available, the coding selection logic 600 may implement the one available mode (608).
The coding selection logic 600 may then determine a coding strategy based on the selected and/or singular coding mode (610). For example, the logic may select a bit depth for one or more operations or calculations. For example, a block may be assigned a bit depth based on its size. RDO calculations and/or other operations may be performed using the assigned bit depth. For example, 8-bit RDO calculations may be used for 4×4 blocks, 7-bit for 8×8, and 6-bit for 16×16, 5-bit for 32×32, and/or 4-bit for 64×64. However, the assigned bit depths may vary widely among and within implementations.
Additionally or alternatively, blocks may be assigned bit depths based on an estimated difficulty of transforming the block. For example, a number of the transform size of the block may indicate transform complexity. In some implementations, blocks associated with larger transform complexity may be assigned lower bit depths, e.g., for cost estimation and/or other calculations, than blocks with smaller transform complexity. In some implementations, the coding strategy may include a selection of the calculation domain. For example, the SSE, which may be used in cost metric calculations, may be calculated in the in the frequency domain instead of the spatial domain. In an example, a higher degree of accuracy may be obtained using a spatial domain calculation instead of a frequency domain calculation. However, in the same example, the frequency domain calculation may be more efficient. Thus, in some cases, a higher accuracy spatial domain calculation may be desirable for less complex operations, and a lower accuracy higher efficiency frequency domain calculation may be desirable for more complex operations.
In various implementations, a coding strategy, e.g. bit depth selection and/or calculation domain selection, may be used to manage a number of parameters. For example, coding strategy selection may depend on operational complexity, resource availability, codec selection, network performance, and/or other factors.
In some implementations, once a coding strategy has been selected, the coding selection logic 600 may code an indicator of the mode and strategy selections (612). For example, the coding selection logic 600 may cause placement, e.g. by the coding logic 300, of an indicator of the coding mode within metadata for the coded stream. Additionally or alternatively, an indication of the coding strategy may be coded into the bitstream. The coding selection logic 600 may send the coding mode and/or coding strategy selections to the coding logic 300 for execution (614).
The memory 720 may be used to store the data and/or media for coding operations. For example, the memory made store mode profiles 761, prediction data 762, buffers 763, source media 764, codecs 765, coding strategy profiles 766, cost metric parameters 767 and weights 768, and/or other data to support the coding logic 300 and/or coding selection logic 600, described above.
The execution device 700 may also include communication interfaces 712, which may support wireless, e.g. Bluetooth, Wi-Fi, WLAN, cellular (4G, LTE/A), and/or wired, ethernet, Gigabit ethernet, optical networking protocols. The communication interface may support communication with external coded data sources 790. For example, the coded data sources may include streaming video servers, headends, and/or other network coded data sources. The coding device 700 may include power functions 734 and various input interfaces 728. The execution device may also include a user interface 718 that may include human interface devices and/or graphical user interfaces (GUI). The user interface may include a display 740 to present video, images, and/or other visual information and/or to the operator. In various implementations, the GUI may support portable access, such as, via a web-based GUI. The coded data, e.g. bitstream, from the coding logic 300 may be passed to the display for viewing by the operator. In various implementations, the system circuitry 714 may be distributed over multiple physical servers and/or be implemented as one or more virtual machines.
The methods, devices, processing, and logic described above may be implemented in many different ways and in many different combinations of hardware and software. For example, all or parts of the implementations may be circuitry that includes an instruction processor, such as a Central Processing Unit (CPU), microcontroller, or a microprocessor; an Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or Field Programmable Gate Array (FPGA); or circuitry that includes discrete logic or other circuit components, including analog circuit components, digital circuit components or both; or any combination thereof. The circuitry may include discrete interconnected hardware components and/or may be combined on a single integrated circuit die, distributed among multiple integrated circuit dies, or implemented in a Multiple Chip Module (MCM) of multiple integrated circuit dies in a common package, as examples.
The circuitry may further include or access instructions for execution by the circuitry. The instructions may be stored in a tangible storage medium that is other than a transitory signal, such as a flash memory, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM); or on a magnetic or optical disc, such as a Compact Disc Read Only Memory (CDROM), Hard Disk Drive (HDD), or other magnetic or optical disk; or in or on another machine-readable medium. A product, such as a computer program product, may include a storage medium and instructions stored in or on the medium, and the instructions when executed by the circuitry in a device may cause the device to implement any of the processing described above or illustrated in the drawings.
The implementations may be distributed as circuitry among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in many different ways, including as data structures such as linked lists, hash tables, arrays, records, objects, or implicit storage mechanisms. Programs may be parts (e.g., subroutines) of a single program, separate programs, distributed across several memories and processors, or implemented in many different ways, such as in a library, such as a shared library (e.g., a Dynamic Link Library (DLL)). The DLL, for example, may store instructions that perform any of the processing described above or illustrated in the drawings, when executed by the circuitry.
Various implementations have been specifically described. However, many other implementations are also possible.
This application claims priority to provisional application Ser. No. 62/057,693, filed Sep. 30, 2014, which is entirely incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62057693 | Sep 2014 | US |