In the following, a method and a device for encoding a picture are disclosed. Corresponding decoding method and decoding device are further disclosed.
It is known to encode a picture divided into blocks by processing the blocks according to a predefined scan order. The scan order is usually specified in a coding standard (e.g. H.264, HEVC). The same scan order is used in the encoder and in the decoder. Exemplarily, in H.264 coding standard, macroblocks (i.e. blocks of 16 by 16 pixels) of a picture Y are processed line by line in a raster scan order as depicted on
A method for decoding a picture divided into blocks is disclosed. The method comprises at least one iteration of:
a) determining for each of at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
b) decoding a part of the picture comprising the block whose priority level is the highest.
Adapting the scan order on the basis of the content of the picture increases the coding efficiency, e.g. decreases coding rate for a given quality or improves quality for a given coding rate. Specifically, taking into account directional gradients in a causal neighborhood favors the blocks having a causal neighborhood well adapted to intra prediction tools.
In an exemplary embodiment, determining for each of at least two blocks adjacent to a reconstructed part of the picture a priority level comprises:
a1) computing, for a spatial direction, directional gradients along the block edge;
a2) propagating the directional gradients along the spatial direction; and
a3) determining an energy from the propagated directional gradients.
Advantageously, the spatial direction belongs to a plurality of spatial directions and the method further comprises:
a4) repeating steps a1) to a3) for each spatial direction of the plurality of spatial directions; and
a6) determining the highest energy, the highest energy being the priority for the current block.
Advantageously, the causal neighborhood belongs to a plurality of causal neighborhoods and the method further comprises before step a6):
a5) repeating steps a1) to a4) for each causal neighborhood in the set of causal neighborhoods.
In a specific embodiment, the reconstructed part belongs to the group comprising:
the blocks located on the borders of the picture;
an epitome of the picture;
a block located at a specific position in the picture.
In a specific embodiment, in step b), the part of the picture comprising the block whose priority level is the highest is a macroblock and decoding the macroblock comprises at least one iteration of:
a) determining for at least two blocks in the macroblock adjacent to the reconstructed part of the picture a priority level; and
b) decoding first the block of the macroblock whose priority level is the highest.
In a variant, in step b), the part of the picture comprising the block whose priority level is the highest is a macroblock and decoding the macroblock comprises:
determining a zig-zag scan order of blocks within the macroblock on the basis of at least the spatial position of a causal neighborhood with respect to the macroblock;
decoding the blocks within the macroblock according to the zig-zag scan order.
Advantageously, the part of the picture comprising the block whose priority level is the highest is a macroblock encompassing the block.
In a variant, the at least two blocks are macroblocks and the part of the picture comprising the block whose priority level is the highest is the macroblock whose priority level is the highest.
A method for encoding a picture divided into blocks is also disclosed that comprises at least one iteration of:
a) determining for each of at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
b) encoding a part of the picture comprising the block whose priority level is the highest.
In a specific embodiment, determining for each of at least two blocks adjacent to a reconstructed part of the picture a priority level comprises:
a1) computing for a spatial direction directional gradients along the block edge;
a2) propagating the directional gradients along the spatial direction; and
a3) determining an energy from the propagated directional gradients.
Advantageously, the spatial direction belongs to a plurality of spatial directions and the method further comprises:
a4) repeating steps a1) to a3) for each spatial direction of the plurality of spatial directions; and
a6) determining the highest energy, the highest energy being the priority for the current block.
Advantageously, the causal neighborhood belongs to a plurality of causal neighborhoods and the method further comprises before step a6):
a5) repeating steps a1) to a4) for each causal neighborhood in the set of causal neighborhoods.
A device for decoding a picture divided into blocks is disclosed that comprises at least one processor configured to:
determine for at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
decode a part of the picture comprising the block whose priority level is the highest.
A device for decoding a picture divided into blocks is disclosed that comprises:
means for determining for at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
means for decoding a part of the picture comprising the block whose priority level is the highest.
The devices for decoding are configured to execute the steps of the decoding method according to any of the embodiments and variants disclosed.
A device for encoding a picture divided into blocks is disclosed that comprises at least one processor configured to:
determine for at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
encode a part of the picture comprising the block whose priority level is the highest.
A device for encoding a picture divided into blocks comprising:
means for determining for at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block; and
means for encoding a part of the picture comprising the block whose priority level is the highest.
The devices for encoding are configured to execute the steps of the encoding method according to any of the embodiments and variants disclosed.
The devices for encoding are configured to execute the steps of the encoding method according to any of the embodiments and variants disclosed.
A computer program product is disclosed that comprises program code instructions to execute of the steps of the decoding method according to any of the embodiments and variants disclosed when this program is executed on a computer.
A processor readable medium is disclosed that has stored therein instructions for causing a processor to perform at least the steps of the decoding method according to any of the embodiments and variants disclosed.
A computer program product is disclosed that comprises program code instructions to execute of the steps of the encoding method according to any of the embodiments and variants disclosed when this program is executed on a computer.
A processor readable medium is disclosed that has stored therein instructions for causing a processor to perform at least the steps of the encoding method according to any of the embodiments and variants disclosed.
In the drawings, an embodiment of the present invention is illustrated. It shows:
The words “decoded” and “reconstructed” are often used as synonyms. Usually but not necessarily, the word “reconstructed” is used on the encoder side and the word “decoded” is used on the decoder side. A causal neighborhood is a neighborhood of a block comprising pixels of a reconstructed part of a picture.
In a step S10, a priority level is determined, e.g. by the module 12, for at least two blocks adjacent to a reconstructed part of the picture. The priority level is responsive at least to directional gradients computed within a causal neighborhood of the block. A block can be a macroblock.
calculating for each template Tj, where j is an index identifying the template that can be used for the block Bi, and for each spatial direction d compatible with Tj, energies of directional gradients E(Bi, Tj, d); and
determining the highest energy of directional gradient and setting the priority value for the block equal to the highest energy of directional gradient, i.e. P(Bi) is equal to maxj,d(E(Bi, Tj, d)).
d is a spatial direction such as the ones used for intra prediction in the H.264 video coding standard. It will be appreciated, however, that the invention is not restricted to these specific spatial directions. Other standards may define other spatial directions for intra prediction. With reference to
The pixels in the template are pixels belonging to the reconstructed part, i.e. they are reconstructed pixels.
According to an exemplary and non-limitative embodiment depicted on
a1) Computing (S100), for a causal neighborhood Tj, i.e. a template, in a set of causal neighborhoods and for a spatial direction d compatible with Tj, directional gradients along the block edge;
a2) Propagating the directional gradients along the spatial direction d in the current block;
a3) Determining (S104) an energy from the propagated directional gradients;
a4) Repeating (S106) steps a1) to a3) for each spatial direction d compatible with Tj;
a5) Repeating (S106) steps a1) to a4) for each causal neighborhood Tj in the set of causal neighborhoods;
a6) Determining (S108) the highest energy, said highest energy being the priority for said current block.
Exemplarily, with the templates T1, T2, T3 and T4 all the spatial directions d are compatible. However, the spatial direction d=1 is not compatible with the template T5 because the pixels in the causal neighborhood are not available for the propagation.
The directional gradients are calculated on the causal neighborhood from a convolution masks moving on this causal neighborhood. Dd with dε[0;8]\{2} below are examples of such convolution masks:
The index is representation of the spatial direction d.
A directional gradient is calculated from a convolution mask Dd of dimension (2N+1)×(2N+1).
where y and x are the indices of the lines and columns of the pixels in the picture and i and j are the indices of the coefficients of the convolution mask F.
When a block is located on a border of the reconstructed part, the missing pixels are padded. Exemplarily, on
Thus with respect to the causal neighborhood represented on
For the convolution masks Dd, with d=3 to 8, the formulas (6) to (8) are applied. For the vertical and horizontal directions d=0 and d=1, the gradients may be computed slightly differently. Indeed, the convolution masks D0 and D1 only have a single line respectively column of non-null coefficients. Consequently, the convolution can be made with the line of pixels just above the current block or the column of pixels just on the left of the current block respectively.
There is no need to compute a gradient value for the pixel M for the directions d=0 and d=1, since the pixel M is not used during the propagation along these directions.
In a variant, the absolute values of the gradients can be propagated instead of the signed values. In the horizontal direction, the gradients are propagated from the left to the right, e.g. the gradients for the pixels located on the first line of the block have the value |GQ|. For the direction vertical right, the propagated directional gradients for the pixels are (2,3) and (4,4) are (|GA|+2|GB|+|GC|+2)/4.
The directional intra predictions as defined in H.264 coding standard require a classical raster scan order of macroblock and zig-zag scan within the macroblock. In this case, the causal neighborhood used for the directional intra prediction is always located on the left and/or on the top of the block. With an adaptive scanning order, the causal neighborhood can be located anywhere around the block. The directional intra predictions as defined in H.264 and depicted on
The energy representative of the impact of a contour of direction d is calculated by summing the absolute values of the gradients in the gradient prediction block. For a gradient prediction block Grd (of dimension L×M), the energy Ed is computed as follows:
In a variant:
with d=0, . . . 8 and d≠2.
The method favors (i.e. give higher priority in the encoding order) the blocks having sharp contours on their frontiers compared to blocks whose neighborhood exhibits weaker gradients). Even if the current block is finally coded in inter or spatial block matching mode, the block probably contains structures which helps in the motion estimation and block matching processing.
Once the priority P(Bi) is determined for at least two blocks adjacent to the frontier δΩ, the block Bnext with the highest priority level Pmax is identified. If two blocks have the same priority that is equal to Pmax, the first block encountered when scanning the picture blocks from top to bottom and left to right is identified.
In a step S12, a part of the picture comprising the block Bnext whose priority level is the highest is encoded, e.g. by the module 14. According to a first embodiment, the block Bnext is a macroblock MBnext. According to a variant, the block Bnext is a block smaller than a macroblock. In the latter case, a macroblock MBnext encompassing the block Bnext is identified. The macroblock MBnext is thus encoded. To this aim, the blocks inside the macroblock MBnext are scanned according to a classical zig-zag scan order as depicted on
Determining a predictor comprises determining a prediction mode which is also encoded in the bitstream. Indeed, a block can be predicted in various ways. Well-known prediction techniques are directional intra prediction as defined in H.264 and HEVC coding standards, template based prediction (e.g. template matching), multi-patches based prediction (e.g Non local mean (NLM), Locally linear embedding (LLE)) are other examples of such prediction techniques. According to a specific embodiment, the highest priority level determined in step S10 is associated with one of the template defined on
The selection of one prediction mode among the various prediction modes can be made according to a well-known rate-distortion technique, i.e. the prediction mode that provides the best compromise in terms of reconstruction error and bit-rate is selected.
Once a block or a macroblock is encoded and reconstructed, the steps S10 and S12 can be iterated until the whole picture is encoded. The method can also be applied on each picture of a sequence of pictures to encode the whole sequence.
The bitstream F is for example transmitted to a destination by the output 16 of the encoding device 1.
The decoding device 2 comprises an input 20 configured to receive a bitstream from a source. The input 20 is linked to a module 22 configured to determine for each of at least two blocks adjacent to a reconstructed part of the picture a priority level responsive at least to directional gradients computed in a causal neighborhood of the block. The reconstructed part is a portion of the picture already decoded. On the decoder side, the reconstructed part can also be named decoded part. As an example, the reconstructed part is the first line of macroblocks in the picture Y which is decoded in a raster scan order. According to a variant, the reconstructed part is a block/macroblock located at specific positions in the picture, e.g. in the center of the picture. According to yet another variant, the reconstructed part is an epitome of the picture Y. An epitome is a condensed representation of a picture. As an example, the epitome is made of patches of texture belonging to the picture Y. On the decoder side, the reconstructed part can be used for prediction of other part of the picture not yet decoded. A block is adjacent to a reconstructed part of the picture if one of its border is along the reconstructed part. The module 22 is linked to a module 24 adapted to decode a part of the picture comprising the block whose priority level is the highest. The module 24 is linked to an output 26. When a picture is decoded, it can be stored in a memory internal to the decoding device 2 or external to it. According to a variant the decoded picture can be sent to a destination.
In a step S20, a priority level is determined, e.g. by the module 22, for at least two blocks adjacent to a reconstructed part of the picture. The priority level is responsive at least to directional gradients computed in a causal neighborhood of the block. A block can be a macroblock. The step S20 is identical to the step S10 on the encoding side. Consequently, the step S20 is not further disclosed. All the variants disclosed with respect to the encoding method for step S10 apply to S20, in particular the non-limitative embodiment disclosed with respect to
In a step S22, the module 24 decodes a part of the picture comprising the block whose priority level is the highest. According to a first embodiment, the block Bnext is a macroblock MBnext. According to a variant, the block Bnext is a block smaller than a macroblock. In the latter case, a macroblock MBnext encompassing the block Bnext is identified. The macroblock MBnext is thus decoded. To this aim, the blocks inside the macroblock are scanned according to a classical zig-zag scan order as depicted on
Decoding a block usually comprises determining a predictor and residues. Determining the residues comprises entropy decoding of a part of the bitstream F representative of the block to obtain coefficients, dequantizing and transforming the coefficients to obtain residues. The residues are added to the predictor to obtain a decoded block. The transforming on the decoding side is the inverse of the transforming on the encoder side.
Determining a predictor comprises determining a prediction mode which is usually decoded from the bitstream. According to a specific embodiment, the highest priority level determined in step S20 is associated with one of the template defined on
Once a block or a macroblock is decoded, the steps S20 and S22 can be iterated until the whole picture is decoded. The method can also be applied on each picture of a sequence of pictures to decode the whole sequence.
The decoded picture is for example sent to a destination by the output 26 of the decoding device 2.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
The invention finds its interest in all domains concerned with the image epitome reduction. Applications related to video compression and representations of videos are concerned.
Number | Date | Country | Kind |
---|---|---|---|
14305605.9 | Apr 2014 | EP | regional |