The present disclosure relates to video coding and, in particular, to management of reference frames employed in predictive video coding applications where low latency, low memory bandwidth, and low power consumption are requirements.
There are many video coding application where low latency, low memory bandwidth and low power consumption properties are important. In one exemplary application, a user may extend display content of a first device (for example, a tablet computer) to a display device of a second device (e.g., a personal computer). In another example, displayed video of a first device (again, a tablet computer perhaps) is mirrored to another display (a smart television). Modern video conferencing applications support exchange of captured video and presentation content on a real-time basis. And, as a further example, displayable content generated from a first user device (example, a smartphone) may be transferred to a display of a second device (a display on a car) where an active user interface might be presented. In these examples, video coding may be applied to compress the video data that is shared between the devices and applications.
In circumstances where displayable content from one device is sent to another, it can occur that only a very small portion of displayable content changes on a frame-to-frame basis. For example, in an application where a user interface is generated by one device but displayed on another, an operator may move a cursor across otherwise static content. Thus, the cursor may be the only content that changes across a plurality of frames. Conventional video compression techniques compress frames in their entirety, which is inefficient for these circumstances, presenting challenges in terms of latency, complexity, and power, especially for frame content with high resolution, e.g., 4K or higher.
Embodiments of the present disclosure provide techniques for coding video in applications where regions of video are inactive on a frame to frame basis. According to the techniques, coding processes update only a subset of pixel blocks of pixels within a frame, while other pixel blocks are retained from a previously coded frame stored in a coder's or decoder's reference frame buffer. The technique is called Backward Reference Updating (or “BRU”) for convenience. At a desired pixel block granularity, based on the activity between a current frame to be coded and its reference frame(s), BRU may perform prediction, transform, quantization, and reconstruction only on selected region(s) that are determined to be active. The reconstructed pixels in these active regions are directly placed onto a specified reference frame in memory instead of creating a new frame in memory. Therefore, fewer memory transfers are performed. BRU can be used as a universal technique that can be used in future image sequence or video coding specifications and their implementations such as extensions of HEVC (H.265) and VVC (H.266) from MPEG/ITU-T, or of AV1, AV2 and the AVM (AOM Video Model) by the Alliance for Open Media (AOM). The proposed techniques provide benefits in low latency, low memory bandwidth, and low power consumption applications.
In the example of
Moreover, the principles of the present disclosure may find applications with a wide variety of networks 130. Such networks 130 may include packet-switched and circuit-switched networks, wired and wireless networks, and computer and communications networks. The architecture and topology of the network 130 is immaterial to the present discussion unless noted otherwise herein.
The video coder 230 may be a predictive video coder, which achieves bandwidth compression by exploiting temporal and/or spatial redundancy in video from the video source 210. The video coder 230 may include a forward coder 232, a decoder 234, a reference frame buffer 236, and a predictor 238. The forward coder 232 may code input data of the pixel blocks differentially with respect to prediction data supplied by the predictor 238. The coded pixel block data of the may be output from the video coder 230 and input also to the decoder 234.
The decoder 234 may generate decoded video from the coded pixel block data of frames that are designated reference frames. These reference frames may serve as candidates for prediction of pixel blocks from other frames that are processed later by the video coder 230. The decoder 234 may decode the coded pixel blocks by inverting coding operations that were applied by the forward coder 232. Coding operations of the forward coder 232 typically incur coding losses so the decoded video obtained by the decoder 234 often will resemble the image data input to the forward coder 232 but it will exhibit some loss of information. When a reference frame is completely decoded, the frame data may be stored in the reference frame buffer 236.
The predictor 238 may perform prediction searches between newly received pixel block data and reference frame data stored in the reference frame buffer 236. When the predictor 238 identifies a match, the predictor 238 may supply pixel block data from a matching frame to the forward coder 232 and the decoder 234 for use in predictive coding and decoding.
The video decoder 320 may invert coding processes applied by a coder (
The frame assembler 330 may reassemble a composite frame from pixel blocks output by the video decoder 320. The frame assembler 330 may arrange frame data according to location information contained in coded metadata.
Many coding applications involve video content where a small number of pixels changes from frame to frame. Consider a screen sharing application where computer generated content contains user interface elements that are static across multiple frames of video, while other content (for example, a cursor or application content in a sub-window of the frame) changes. In such coding applications, many pixel blocks will contain content that does not change between successive frames; such pixel blocks are deemed to be “inactive” pixel blocks for purposes of the present discussion. Other pixel blocks, those that contain content that changes between frames, may be considered to be “active” pixel blocks. In an encoding system 200 (
As discussed, it is common to employ coding protocols that are defined by video coding specifications to encourage interoperability between encoders and decoders from different vendors. The techniques proposed herein may be cooperatively with these video coding standards by defining a protocol to indicate active and inactive regions. First, the techniques may be employed at coding granularities defined by the coding specifications, at, for example, the largest coding unit level, a coding level unit level, a super block level, or a macroblock level, depending on the specification that is employed; these coding granularities represent pixel blocks as discussed herein. Moreover, syntax elements may be added to the video coding specification(s) to identify the pixel blocks that are classified as active and those identified as inactive.
Backward reference updating can promote conservation of processing resource in applications, such as screen sharing and video conferencing, where frame active region(s) are relatively small in comparison to inactive regions such that relatively few pixels need to be coded and/or copied in each frame, and, particularly, when spatial and temporal motion in the frame content are continuously smooth. In such applications, the data to overwrite prior content from reference frames should be sparse, i.e., the area of active region is smaller than the inactive region. As a result, a video coding system can save bandwidth and power on frame processing because the majority of the frame's spatial area will be inactive.
When BRU is performed on a reference frame, some portion of data already stored in the reference frame is overwritten. This is equivalent to evicting a reference frame in the reference frame buffer 236, 324, though no memory operations are performed to ‘drop’ the frame. Video encoders and decoders 230, 320, however, perform synchronous operations to maintain pools of reference frames that are available for use a reference frames when coding new input frame; when an overwrite operation is performed on a previously decoded reference frame, the previously decoded reference frame should be disqualified from serving thereafter as a prediction reference during later coding because it will no longer be intact in the buffer pool. A new reference frame effectively is created when content of active pixel blocks overwrites co-located content of the previously decoded reference frame; the new reference frame may be added to the buffer pool for use thereafter in coding new input frames. Moreover, processes should be employed to display the ‘dropped’ reference frame before it is overwritten.
After coding of the frame's active pixel blocks completes, the method 400 may decode the coded pixel block(s) and cause corresponding portion(s) of a previously stored reference frame in the reference frame buffer 236 (
After decoding of the frame's active pixel blocks completes, the method 500 may cause a previously stored reference frame in the reference frame buffer 324 (
The methods of
The methods of
Identification of active and inactive regions may be performed in a variety of ways. In one embodiment, an encoder may signal a bit map that identifies active and inactive pixel blocks before providing coded content of the frame. In another embodiment, coded pixel blocks may content a flag that identifies whether the frame is active or inactive.
For example, at the beginning of each frame/tile, a bru_enabled flag may be signaled to indicate whether the frame/tile has the BRU scheme enabled. In an AV2 application, a BRU frame may be identified in an Open Bitstream Unit element (OBU) before providing coded payload of the frame. In such an application, a decoder may prepare frame buffers before doing actual decoding of the BRU enabled frame. If the decoder receives a bru_enabled=0, the decoder may perform normal prediction and reconstruction processes for the entire coded frame/tile. A bru_enabled=1 may indicate that BRU processes have been applied to the frame. Thus, right after a coding block is received with the block segmentation information, the decoder will be able to fully determine the status of the current coding block. If the pixel block is designated as an inactive block, the pixel block may contain no coded payload, and the decoder need not perform any decoding operation. Otherwise, if the pixel block is an active block, the decoder may perform prediction and reconstruction processes as defined in the coded pixel block payload. The decoded information of the active region(s) will be copied to the corresponding reference frame in the reference frame buffer 324.
As discussed, when a coding block is determined to be inactive, decode processes will not be applied at an encoder 230 or a decoder 320 and the reconstruction buffer pixel values of the inactive region are undefined. Moreover, in a pipelined hardware implementation, it may occur that, at the time an encoder or decoder processes a given pixel block, information of other previously coded active pixel blocks may not be available because, for example, they have not been committed to memory (the reference frame buffers 236 or 324). In such cases, the encoder 230 or decoder 320 has certain regions of a frame available to it and other regions of the frame unavailable to it.
At the time the current pixel block 810 is being processed, previous locations of the frame 800 in the raster scan order may have been considered for coding but content for those locations may not be available to an encoder or a decoder. Some reconstructed regions of the frame 820.1, 820.2, 820.3 may be available to the encoder and decoder. Other locations, shown as 830, may be unavailable because, for example, pixel blocks of those locations were designated an inactive by one of the methods of
When BRU processing is performed on a frame, many coding tools that rely on pixels in a current reconstruction buffer can be affected. Take intra prediction as an example, when a coding block is adjacent to any inactive coding blocks, the intra prediction may not have valid boundary pixels as in the non-BRU case. As illustrated in
As an alternative approach, encoder 230 and/or decoder 320 may infer pixels in unavailable regions from other pixels of available regions 920.1, 920.2, 920.3 surrounding the unavailable pixels if possible, using techniques such as extending or interpolating. A similar condition could also apply when an encoder evaluates Intra Block Copy (IBC) techniques for the current pixel block 910. If the region to be copied contains any inactive regions 920.1, 920.2, or 920.3, the encoder 230 may set them to a deterministic value or just do not use those regions. A decoder 320 may follow these same techniques.
An alternative approach may be more implementational friendly, i.e., the decoder implies that ibc_allowed=0 for any frame with bru_enabled=1.
BRU techniques may alter application of in-loop filters in an embodiment. Most importantly, filters need not be applied on the reconstructed frame buffer, but directly on the corresponding updated reference frame. In general, in-loop filters may be applied to border pixels between adjacent available coding blocks or between available and unavailable coding blocks. In the example of
In some coding applications where pixel blocks along an edge of a frame are coded, it can be common for video coders and decoders to develop padding content 1080 along the edge boundaries of pixel blocks for filtering purposes. In another example, filtering may be applied also on boundaries between available pixel blocks such as block 1030 and the padding content 1080. Filtering would not be performed at boundaries between unavailable pixel blocks 1010, 1060 and the padding region.
As discussed (
Many typical inter prediction tools, as well as the filtering processes, use padded pixels for better performance and universal implementation. One example is shown in
Filters on frame boundar(ies) may be managed by the BRU process. As shown in
In an embodiment, communication of filtering operations between an encoder and a decoder (
In some embodiments, the encoder 230 and decoder 320 perform pipelined implementations for the BRU updates, especially in the
The restriction criteria could be different in different decoder designs. In one embodiment, an encoder may signal a group of restricted region offsets at a sequence level in a coding syntax, which causes motion vector coding for all inter coding blocks to be modified based on the offsets to also reduce bitrate. The pixels in the current collocated pixel block in the reference frame that will be overwritten are always available.
One restriction criteria, which may decrease the complexity introduced by motion vector restriction in the case of
In this example, a prediction restriction may be applied that causes the frame identified by bru_ref_idx (frame n+k in the example of
In this example, a prediction restriction may be applied that causes the frame identified by bru_ref_idx (frame n+k−1 in the example of
In coding applications that favor use of the BRU techniques, updating may occur continuously from frame to frame without interruption. Assume that BRU techniques are used in every frame. In this case, encoders 230 and decoders 320 will develop only one reference frame in their reference frame buffers 236, 324. Overall coding quality may be harmed due to a lack of variety of reference frames. In an embodiment, an encoder 230 could intentionally turn off BRU processing for some frames that otherwise would be coded efficiently using those techniques. For example, the encoder could turn off BRU for N frames when it is determined that the number of reference frames in the reference frame pool falls below a threshold number.
In one implementation, an encoder may disable BRU techniques on a predetermined basis. One such example is illustrated in
At some point, another frame 1560 may be selected for high quality coding. In this event, the process of disabling BRU coding of select frames may be repeated. As illustrated, BRU coding may be disabled for frame 1570, which prevents content of frame 1560 from being overwritten in a reference frame buffer. But the BRU techniques may be reengaged for subsequent frames, such as frame 1580, which may cause an overwrite to frame 1570 in the encoders and decoders' reference frame buffers.
Selection of frames to preserve may be done in a variety of ways. Many video coders organize frames into Group of Frame (GOP) constructs in which the first coded frame of each GOP is coded with the highest possible quality as compared to other frames of the GOP. Other protocols employ intra refresh coding techniques in which intra coded frames are assigned relatively high coding quality. Other rate control techniques may assign to select frames high coding qualities. In each case, the frames that are coded with high coding quality may be selected for preservation by disabling BRU coding techniques for other frames that otherwise would overwrite content of the high quality coded frame.
As a corollary, when the coding quality of different frames varies, such as the frames in a GOP with different temporal layers, or when Rate Control is enabled, BRU coding techniques may be applied to select a reference frame to be overwritten that has a similar coding quality frame as the reference frame being coded. In this manner, low quality reconstructions will not propagate to high quality reconstructions that will be used in the future coding.
In another embodiment, illustrated in
In the example of
In the example of
In the example of
Signaling of BRU information may occur in a variety of ways. A sequence level seq_bru_enabled flag may be provided to signal in a sequence header OBU if BRU scheme are used in decoding this sequence.
BRU information also may be provided at the frame level. For example, a bru_enabled flag may be provided at the frame level, along with other fields such as a reference index and motion vector restriction information. The BRU syntax elements can be provided in the frame header OBU or by defining a BRU OBU before a frame header syntax element. If the encoder and decoder have a scheme for encoding skipped sub blocks such as Segmentation tools as provided in AV2, BRU can be signaled together with such tools.
Continuing with the AV2 example, BRU frame level information can be signaled through segmentation frame level information as shown in
For example, in one embodiment, signaling information may identify a number of restriction areas and their corresponding location offsets and sizes as shown in Table 1. The method 1700 may read a restrict_mv flag to determine whether motion vector restrictions have been applied (boxes 1760, 1770). If so, the method 1700 may read identifiers of restricted regions from coded frame data (box 1780) and apply them as prediction restrictions, effectively disqualifying the identified region(s) from serving as prediction references for the frame. Although providing identifiers of restricted regions is optional, their use can promote resource conservation at a decoder.
In the embodiment of AV2, there is no extra signaling in the sub block level for the BRU. As shown in
In a pipelined hardware system, processing conflicts may arise among different parallel units that perform different processes. For example, the decoding process performed by a first processing unit may access an area in the reference buffer while this area is being updated through BRU by the decoding process of other unit(s). Therefore, it may be useful to restrict the values of the motion vectors to avoid such conflicts. In one example, using references that may cross a slice/tile boundary may be disallowed.
When a slice/tile does not contain any active regions, it would be desirable to skip decoding this parallel unit immediately. Therefore, a flag (parallel_unit_skip_flag) in a parallel unit level may be provided to indicate whether the slice/tile is a skip unit or not. When the flag is 1, the unit is an inactive unit and the decoding process of this slice/tile is skipped. It is noted that there is no further information that is signaled for a skip parallel slice/tile.
When parallel_unit_skip_flag is 0, it may indicate that the unit has active regions. In this case, a segmentation map flag may be signaled for this unit if enable_segment is 1 at the frame level. If the segmentation is enabled for a unit, a segmentation map also may be transmitted to indicate which Segment ID will be used in which spatial regions of the unit. As introduced in
When a parallel unit is not a skip unit, a flag (parallel_unit_bru_enabled) may be signaled to indicate whether BRU is enabled for this unit or not. If parallel_unit_bru_enabled=1, BRU is enabled for this unit and a syntax element (bru_ref_idx) may be added to indicate which reference buffer will be updated by this unit. It is noted that different BRU units in a frame may have different bru_ref_idx values.
In one embodiment, frames may not be output at the earliest time, but delayed by a few frames. If any parallel unit is used as a BRU reference before the frame that it is in is output for display, that parallel unit may be updated after it had been decoded but before being output. This delay may be beneficial in that the update may improve the quality of that parallel unit.
In another embodiment, block bru_skip flags may be signaled in a coding syntax to indicate when BRU processes are disabled. This embodiment avoids use of segmentation (or implementation in the other codecs). In such embodiments, to save the coding bit cost of signaling such flags, a predefined minimum updating size (min_bru_size) may be defined in a coding protocol that defines a pixel block granularity at which flags are provided. Pixel blocks at smaller granularities may be inherited from larger pixel blocks to which they belong.
One such example is illustrated in
If the bru_skip flag is not set, then the method 2110 may decode partitions (box 2130) and restructure sub-blocks within the pixel block (box 2140). The method 2100 also may decode and apply in loop parameters for the pixel block (box 2150). In this regard, operations of boxes 2130-2150 may proceed as discussed in the foregoing embodiments.
The frame level and Super block level syntaxes are demonstrated in
Applying min_bru_size to Super Block level could also help filter signaling and latency. As shown in
The pixel block coder 2210 and the predictor 2260 may receive data of an input pixel block. The predictor 2260 may generate prediction data for the input pixel block and input it to the pixel block coder 2210. The pixel block coder 2210 may code the input pixel block differentially with respect to the predicted pixel block output coded pixel block data to the syntax unit 2280. The pixel block decoder 2220 may decode the coded pixel block data, also using the predicted pixel block data from the predictor 2260, and may generate decoded pixel block data therefrom.
The frame buffer 2230 may generate reconstructed frame data from decoded pixel block data. The in-loop filter 2240 may perform one or more filtering operations on the reconstructed frame. For example, the in-loop filter 2240 may perform deblocking filtering, sample adaptive offset (SAO) filtering, adaptive loop filtering (ALF), maximum likelihood (ML) based filtering schemes, deringing, debanding, sharpening, resolution scaling, and the like. The reference frame buffer 2250 may store the filtered frame, where it may be used as a source of prediction of later-received pixel blocks. The syntax unit 2280 may assemble a data stream from the coded pixel block data, which conforms to a governing coding protocol.
The pixel block coder 2210 may include a subtractor 2212, a transform unit 2214, a quantizer 2216, and an entropy coder 2218. The pixel block coder 2210 may accept pixel blocks of input data at the subtractor 2212. The subtractor 2212 may receive predicted pixel blocks from the predictor 2260 and generate an array of pixel residuals therefrom representing a difference between the input pixel block and the predicted pixel block. The transform unit 2214 may apply a transform to the sample data output from the subtractor 2212, to convert data from the pixel domain to a domain of transform coefficients. The quantizer 2216 may perform quantization of transform coefficients output by the transform unit 2214. The quantizer 2216 may be a uniform or a non-uniform quantizer. The entropy coder 2218 may reduce bandwidth of the output of the coefficient quantizer by coding the output, for example, by variable length code words or using a context adaptive binary arithmetic coder.
The transform unit 2214 may operate in a variety of transform modes as determined by the controller 2270. For example, the transform unit 2214 may apply a discrete cosine transform (DCT), a discrete sine transform (DST), a Walsh-Hadamard transform, a Haar transform, a Daubechies wavelet transform, or the like. In an aspect, the controller 2270 may select a coding mode M to be applied by the transform unit 2214, may configure the transform unit 2214 accordingly and may signal the coding mode M in the coded video data, either expressly or impliedly.
The quantizer 2216 may operate according to a quantization parameter QP that is supplied by the controller 2270. In an aspect, the quantization parameter QP may be applied to the transform coefficients as a multi-value quantization parameter, which may vary, for example, across different coefficient locations within a transform-domain pixel block. Thus, the quantization parameter QP may be provided as a quantization parameters array.
The entropy coder 2218, as its name implies, may perform entropy coding of data output from the quantizer 2216. For example, the entropy coder 2218 may perform run length coding, Huffman coding, Golomb coding, Context Adaptive Binary Arithmetic Coding, and the like.
The pixel block decoder 2220 may invert coding operations of the pixel block coder 2210. For example, the pixel block decoder 2220 may include a dequantizer 2222, an inverse transform unit 2224, and an adder 2226. The pixel block decoder 2220 may take its input data from an output of the quantizer 2216. Although permissible, the pixel block decoder 2220 need not perform entropy decoding of entropy-coded data since entropy coding is a lossless event. The dequantizer 2222 may invert operations of the quantizer 2216 of the pixel block coder 2210. The dequantizer 2222 may perform uniform or non-uniform de-quantization as specified by the decoded signal QP. Similarly, the inverse transform unit 2224 may invert operations of the transform unit 2214. The dequantizer 2222 and the inverse transform unit 2224 may use the same quantization parameters QP and transform mode M as their counterparts in the pixel block coder 2210. Quantization operations likely will truncate data in various respects and, therefore, data recovered by the dequantizer 2222 likely will possess coding errors when compared to the data presented to the quantizer 2216 in the pixel block coder 2210.
The adder 2226 may invert operations performed by the subtractor 2212. It may receive the same prediction pixel block from the predictor 2260 that the subtractor 2212 used in generating residual signals. The adder 2226 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 2224 and may output reconstructed pixel block data.
As described, the frame buffer 2230 may assemble a reconstructed frame from the output of the pixel block decoders 2220. The in-loop filter 2240 may perform various filtering operations on recovered pixel block data. For example, the in-loop filter 2240 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in-loop filters (not shown).
The reference frame buffer 2250 may store filtered frame data for use in later predictions of other pixel blocks. Different types of prediction data are made available to the predictor 2260 for different prediction modes. For example, for an input pixel block, intra prediction takes a prediction reference from decoded data of the same frame in which the input pixel block is located. Thus, the reference frame buffer 2250 may store decoded pixel block data of each frame as it is coded. For the same input pixel block, inter prediction may take a prediction reference from previously coded and decoded frame(s) that are designated as reference frame. Thus, the reference frame buffer 2250 may store these decoded reference frames.
The controller 2270 may control overall operation of the coding system 2200. The controller 2270 may select operational parameters for the pixel block coder 2210 and the predictor 2260 based on analyses of input pixel blocks and also external constraints, such as coding bitrate targets and other operational parameters. As is relevant to the present discussion, when it selects quantization parameters QP, the use of uniform or non-uniform quantizers, and/or the transform mode M, it may provide those parameters to the syntax unit 2280, which may include data representing those parameters in the data stream of coded video data output by the system 2200. The controller 2270 also may select between different modes of operation by which the system may generate reference images and may include metadata identifying the modes selected for each portion of coded data.
During operation, the controller 2270 may revise operational parameters of the quantizer 2216 and the transform unit 2214 at different granularities of image data, either on a per pixel block basis or on a larger granularity (for example, per tile, per slice, per largest coding unit (“LCU”) or Coding Tree Unit (CTU), or another region). In an aspect, the quantization parameters may be revised on a per-pixel basis within a coded frame.
Additionally, as discussed, the controller 2270 may control operation of the in-loop filter 2240 and the prediction unit 2260. Such control may include, for the prediction unit 2260, mode selection (lambda, modes to be tested, search windows, distortion strategies, etc.), and, for the in-loop filter 2240, selection of filter parameters, reordering parameters, weighted prediction, etc.
The pixel block decoder 2310 may include an entropy decoder 2312, an inverse quantizer 2314, an inverse transformer 2316, and an adder 2318. The entropy decoder 2312 may perform entropy decoding to invert processes performed by the entropy coder 2218 (
The adder 2318 may invert operations performed by the subtractor 2210 (
As described, the reference frame buffer 2330 may assemble a reconstructed frame from the output of the pixel block decoder 2310. The in-loop filter 2320 may perform various filtering operations on recovered pixel block data as identified by the coded video data. For example, the in-loop filter 2320 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in-loop filters. In this manner, operation of the reference frame buffer 2330 and the in-loop filter 2340 mimics operation of the counterpart frame buffer 2230 and in-loop filter 2240 of the encoder 2200 (
The reference frame buffer 2330 may store filtered frame data for use in later prediction of other pixel blocks. The reference frame buffer 2330 may store decoded frames as it is coded for use in intra prediction. The reference frame buffer 2330 also may store decoded reference frames.
The controller 2360 may control overall operation of the coding system 2300. The controller 2360 may set operational parameters for the pixel block decoder 2310 and the predictor 2350 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters QP for the dequantizer 2314 and transform modes M for the inverse transform unit 2310. As discussed, the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per tile basis, a per slice basis, a per LCU/CTU basis, or based on other types of regions defined for the input image.
In a further embodiment, BRU replacement operations may be performed at granularities greater than an active pixel block.
The EAR may be defined to extend the active region by one super block along each edge of the identified active blocks AB1, AB2. In the example of
The encoder may code the extended blocks EB1-EB12 as a special copy mode referring to the BRU reference frame, which instructs the decoder to use motion vector (0,0) for motion compensation and to skip transform and residual coding. As a result, the extended active region is just a duplication of pixels from the collocated pixels of BRU reference frame. The extended blocks' coding, therefore, involves low coding overhead.
Coding of active blocks AB1, AB2 may be performed by intra coding using content from the extended blocks EB1-EB12 as sources of prediction. In this embodiment, intra prediction and filtering restrictions may be applied across a boundary of the EAR and other inactive blocks of the frame 2500. Pixels of the extended blocks EB1-EB12 along an edge of the active blocks may be filtered as desired.
Generation of the extended blocks EB1-EB12 may consume some power in a processing system but at a reasonable level. The foregoing embodiment, however, can reduce a lot of complicated logic for processing hardware to achieve the similar efficiency. After the filtering process, the EAR may be stored to a reference picture buffer in both a decoder (
Coding operations in this embodiment may cause read and write operations to be made on a common reference frame in a reference picture buffer, which should be scheduled on pixel block bases to avoid conflict. As illustrated in
In a further embodiment, illustrated in
The following discussion provides an exemplary syntax that may be exchanged between encoders and decoders when communication information regarding EARs:
A sequence level seq_bru_enabled flag may be signaled in the sequence header open bitstream unit (“OBU”) to indicate if the BRU scheme are used in decoding this sequence.
At a frame level, A decoder may parse the bru_frame_enabled flag from the sequence level first. If bru_frame_enabled=0, no further BRU parsing will be performed, and the current frame is not a BRU frame. If bru_frame_enabled=1, bru_ref_idx, which indicates which reference frame is used for updating.
The signaling of active region may vary depending on the application. One example is to signal bru_active at super block level at the beginning of parsing the super block syntax.
An alternative solution is signaling the active region at frame level. One codec could signal total number of active super blocks first, and offsets (x, y) of each signal block. In this case, extended active super block can be viewed as active super block. Extra mode information should be signaled to indicate which active super block is extended active super block which is decoded by direct copy mode.
The inference of extended active region could also be affected by in-loop filter. If the in-loop filter is turned off on sequence level or frame level, the extended active super block could be automatically removed such that only active super block is necessary. Of course, in this case all the intra prediction restrictions addressed in previous section will be applied.
There could be many ways to signal offsets of active super block. Raster scan order super block index or absolute locations of super block are the options. An alternative method is to cluster the active regions into spatially closed groups or sub-frames, and signal the group location and each active super block offsets within the group. The codec could also use temporal information to predict current active super block location from previously decoded super block location from previous frame.
To reduce signaling overhead, a codec could also employ implicit signaling for extended block(s), meaning information of the extended block(s) may be derived from data of an active super block. For example, in an embodiment where an EAR is defined by providing an extended block on edges of active blocks (
In another variation, the BRU technique can be implemented as a special case of crop mode on the coding frame. In this embodiment, the isolated active regions may be defined as different crops in the frame. The codecs will code only the active crops. Thus, in this embodiment, a coded bitstream should consist of a group of crops which could be decoded sequentially or in parallel. All the crops in a current coding frame may have the same set of reference frame buffers. After the pixels of reconstructed crops are ready, the decoded crops may be updated by pasting or overlaying onto an existing BRU reference frame.
In yet another variation, shown in
In an embodiment, a simplified BRU algorithm can be implemented by defining an anchor frame, which is not a BRU frame. In this embodiment, an example of which is shown in
Defining an anchor reference frame and manipulating such combinations can provide advantages in coder/decoder systems. In the encoder and decoder, there is no ‘backward update’ scheme that requires an update and, thereby, consumption of a reference frame. In backward updates processes, a ‘refresh’ of a reference frame represents an operation that logically drops the BRU reference frame from the reference frame buffer pool and assigns its updated variant as a new reference frame. The anchor frame technique can improve coding efficiency because the number of available reference frames is greater than in the ordinary BRU process. Another advantage arises because the ‘updating’ operation is removed, in which case maintenance of the reference frame buffer is different than in codecs that do not support BRU techniques.
In this scheme, as shown in
Another simplification of BRU involves copying an entire inactive region frame BRU reference frame as the current frame before coding current frame (i.e. treat all the inactive region as extended active region) and superimposing coded active blocks upon it.
The foregoing discussion has described operation of the aspects of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays, and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones, or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic-, and/or optically-based storage devices, where they are read to a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
Video coders and decoders may exchange video through channels in a variety of ways. They may communicate with each other via communication and/or computer networks as illustrated in
Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
The present application benefits from priority of U.S. patent application Ser. No. 63/632,870, filed Apr. 11, 2024 and entitled “Backward Reference Updating for Video Coding, and U.S. patent application Ser. No. 63/580,471, filed Sep. 5, 2023 and entitled “Backward Reference Updating for Video Coding,” the disclosures of which are incorporated herein in their entireties.
Number | Date | Country | |
---|---|---|---|
63632870 | Apr 2024 | US | |
63580471 | Sep 2023 | US |