This disclosure relates to digital video signals and more specifically to systems and methods for improving the quality of compressed digital video signals by separating the video signals into Deblock and Detail regions and by smoothing the Deblock region.
It is well-known that video signals are represented by large amounts of digital data, relative to the amount of digital data required to represent text information or audio signals. Digital video signals consequently occupy relatively large bandwidths when transmitted at high bit rates and especially when these bit rates must correspond to the real-time digital video signals demanded by video display devices.
In particular, the simultaneous transmission and reception of a large number of distinct video signals, over such communications channels as cable or fiber, is often achieved by frequency-multiplexing or time-multiplexing these video signals in ways that share the available bandwidths in the various communication channels.
Digitized video data are typically embedded with the audio and other data in formatted media files according to internationally agreed formatting standards (e.g. MPEG2, MPEG4, H264). Such files are typically distributed and multiplexed over the Internet and stored separately in the digital memories of computers, cell phones, digital video recorders and on compact discs (CDs) and digital video discs DVDs). Many of these devices are physically and indistinguishably merging into single devices.
In the process of creating formatted media files, the file data is subjected to various levels and types of digital compression in order to reduce the amount of digital data required for their representation, thereby reducing the memory storage requirement as well as the bandwidth required for their faithful simultaneous transmission when multiplexed with multiple other video files.
The Internet provides an especially complex example of the delivery of video data in which video files are multiplexed in many different ways and over many different channels (i.e. paths) during their downloaded transmission from the centralized server to the end user. However, in virtually all cases, it is desirable that, for a given original digital video source and a given quality of the end user's received and displayed video, the resultant video file be compressed to the smallest possible size.
Formatted video files might represent a complete digitized movie. Movie files may be downloaded ‘on demand’ for immediate display and viewing in real-time or for storage in end-user recording devices, such as digital video recorders, for later viewing in real-time.
Compression of the video component of these video files therefore not only conserves bandwidth, for the purposes of transmission, but it also reduces the overall memory required to store such movie files.
At the receiver end of the abovementioned communication channels, single-user computing and storage devices are typically employed. Currently-distinct examples of such single-user devices are the personal computer and the digital set top box, either or both of which are typically output-connected to the end-user's video display device (e.g. TV) and input-connected, either directly or indirectly, to a wired copper distribution cable line (i.e. Cable TV). Typically, this cable simultaneously carries hundreds of real-time multiplexed digital video signals and is often input-connected to an optical fiber cable that carries the terrestrial video signals from a local distributor of video programming. End-user satellite dishes are also used to receive broadcast video signals. Whether the end-user employs video signals that are delivered via terrestrial cable or satellite, end-user digital set top boxes, or their equivalents, are typically used to receive digital video signals and to select the particular video signal that is to be viewed (i.e. the so-called TV Channel or TV Program). These transmitted digital video signals are often in compressed digital formats and therefore must be uncompressed in real-time after reception by the end-user.
Most methods of video compression reduce the amount of digital video data by retaining only a digital approximation of the original uncompressed video signal. Consequently, there exists a measurable difference between the original video signal prior to compression and the uncompressed video signal. This difference is defined as the video distortion. For a given method of video compression, the level of video distortion almost always becomes larger as the amount of data in the compressed video data is reduced by choosing different parameters for those methods. That is, video distortion tends to increase with increasing levels of compression.
As the level of video compression is increased, the video distortion eventually becomes visible to the human vision system (HVS) and eventually this distortion becomes visibly-objectionable to the typical viewer of the real-time video on the chosen display device. The video distortion is observed as a so-called artifact. An artifact is observed video content that is interpreted by the HVS as not belonging to the original uncompressed video scene.
Methods exist for significantly attenuating visibly-objectionable artifacts from compressed video, either during or after compression. Most of these methods apply only to compression methods that employ the block-based Two-dimensional (2D) Discrete Cosine Transform (DCT) or approximations thereof. In the following, we refer to these methods as DCT-based. In such cases, by far the most visibly-objectionable artifact is the appearance of artifact blocks in the displayed video scene.
Methods exist for attenuating the artifact blocks typically either by searching for the blocks or by requiring a priori knowledge of where they are located in each frame of the video.
The problem of attenuating the appearance of visibly-objectionable artifacts is especially difficult for the widely-occurring case where the video data has been previously compressed and decompressed, perhaps more than once, or where it has been previously re-sized, re-formatted or color re-mixed. For example, video data may have been re-formatted from the NTSC to PAL format or converted from the RGB to the YCrCb format. In such cases, a priori knowledge of the locations of the artifact blocks is almost certainly unknown and therefore methods that depend on this knowledge do not work.
Methods for attenuating the appearance of video artifacts must not add significantly to the overall amount of data required to represent the compressed video data. This constraint is a major design challenge. For example, each of the three colors of each pixel in each frame of the displayed video is typically represented by 8 bits, therefore amounting to 24 bits per colored pixel. For example, if pushed to the limits of compression where visibly-objectionable artifacts are evident, the H264 (DCT-based) video compression standard is capable of achieving compression of video data corresponding at its low end to approximately 1/40th of a bit per pixel. This therefore corresponds to an average compression ratio of better than 40×24=960. Any method for attenuating the video artifacts, at this compression ratio, must therefore add an insignificant number of bits relative to 1/40th of a bit per pixel. Methods are required for attenuating the appearance of block artifacts when the compression ratio is so high that the average number of bits per pixel is typically less than 1/40th of a bit.
For DCT-based and other block-based compression methods, the most serious visibly-objectionable artifacts are in the form of small rectangular blocks that typically vary with time, size and orientation in ways that depend on the local spatial-temporal characteristics of the video scene. In particular, the nature of the artifact blocks depends upon the local motions of objects in the video scene and on the amount of spatial detail that those objects contain. As the compression ratio is increased for a particular video, MPEG-based DCT-based video encoders allocate progressively fewer bits to the so-called quantized basis functions that represent the intensities of the pixels within each block. The number of bits that are allocated in each block is determined on the basis of extensive psycho-visual knowledge about the HVS. For example, the shapes and edges of video objects and the smooth-temporal trajectories of their motions are psycho-visually important and therefore bits must be allocated to ensure their fidelity, as in all MPEG DCT based methods.
As the level of compression increases, and in its goal to retain the above mentioned fidelity, the compression method (in the so-called encoder) eventually allocates a constant (or almost constant) intensity to each block and it is this block-artifact that is usually the most visually objectionable. It is estimated that if artifact blocks differ in relative uniform intensity by greater than 3% from that of their immediate neighboring blocks, then the spatial region containing these blocks is visibly-objectionable. In video scenes that have been heavily-compressed using block-based DCT-type methods, large regions of many frames contain such block artifacts.
The present invention is directed to systems and methods in which, for a given amount of data required to represent a compressed video signal, the quality of the uncompressed displayed real-time video, as perceived by a typical human viewer, is improved. Systems and methods herein achieve this improvement by attenuating the appearance of blocks without necessarily having a priori knowledge of their locations. In some embodiments, the methods described herein attenuate the appearance of these blocks such that the quality of the resultant real-time video, as perceived by the HVS, is improved.
In terms of the intensity difference between the compressed and uncompressed versions of a video, the blocky regions may not be the largest contributors to a mathematical metric of overall video distortion. There is typically significant mathematical distortion in the detailed regions of a video but advantage is taken of the fact that the HVS does not perceive that distortion as readily as it perceives the distortion due to block artifacts.
In the embodiments discussed herein, the first step of the method separates the digital representations of each frame into two parts referred to as the Deblock region and the Detail Region. The second step of the method operates on the Deblock region to attenuate the block artifacts resulting in a smoothed Deblock Region. The third step of the method recombines the smoothed Deblock region and the Detail Region.
In one embodiment, the identification of the Deblock region commences by selecting candidate regions and then comparing each candidate region against its surrounding neighborhood region using a set of criteria, such as:
a. Flatness-of-Intensity Criteria (F),
b. Discontinuity Criteria (D) and
c. Look-Ahead/Look-Behind Criteria (L).
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
One aspect of the disclosed embodiment is to attenuate the appearance of block artifacts in real-time video signals by identifying a region in each frame of the video signal for deblocking using flatness criteria and discontinuity criteria. Additional gradient criteria can be combined to further improve robustness. Using these concepts, the size of the video file (or the number of bits required in a transmission of the video signals) can be reduced since the visual effects of artifacts associated with the reduced file size can be reduced.
One embodiment of a method to perform these concepts consists of three parts with respect to image frames of the video signal:
1. A process to identify a Deblock region (DEB) that distinguishes the Deblock region from a so-called Detail region (DET);
2. An operation applied to the Deblock region DEB for the purposes of attenuating, by spatial smoothing, the appearance of block artifacts in the Deblock Region; and
3. A process to combine the now smoothed Deblock region obtained in part 2 with the Detail Region.
In the method of this embodiment the spatial-smoothing operation does not operate outside of the Deblock Region: equivalently, it does not operate in the Detail Region. As will be discussed herein, methods are employed to determine that the spatial-smoothing operation has reached the boundaries of the Deblock region DEB so that smoothing does not occur outside of the Deblock Region.
Video signals that have been previously subjected to block-based types of video compression (e.g. DCT-based compression) and decompression, and possibly to re-sizing and/or reformatting and/or color re-mixing, typically contain visibly-objectionable residues of block artifacts that first occurred during previous compression operations. Therefore, the removal of block-induced artifacts cannot be completely achieved by attenuating the appearance of only those blocks that were created in the last or current compression operation.
In many cases, a priori information about the locations of these previously created blocks is unavailable and blocks at unknown locations often contribute to objectionable artifacts. Embodiments of this method identify the region to be de-blocked by means of criteria that do not require a priori knowledge of the locations of the blocks.
In one embodiment, a flatness-of-intensity criteria method is employed and intensity-discontinuity criteria and/or intensity-gradient criteria is used to identify the Deblock region of each video frame which is to be de-blocked without specifically finding or identifying the locations of individual blocks. The Deblock region typically consists, in each frame, of many unconnected sub-regions of various sizes and shapes. This method only depends on information within the image frame to identify the Deblock region in that image frame. The remaining region of the image frame, after this identification, is defined as the Detail region.
Video scenes consist of video objects. These objects are typically distinguished and recognized (by the HVS and the associated neural responses) in terms of the locations and motions of their intensity-edges and the texture of their interiors. For example,
While not clearly visible in the image frame of
This method takes advantage of the psycho-visual property that the HVS is especially aware of, and sensitive to, those block artifacts (and their associated edge intensity-discontinuities) that are located in relatively large open areas of the image where there is almost constant intensity or smoothly-varying image intensity in the original image. For example, in
As another example of the sensitivity of the HVS to block artifacts, if the HVS perceives a video image of a uniformly-colored flat shaded surface, such as an illuminated wall, then block edge intensity-discontinuities of more than about 3% are visibly-objectionable whereas similar block edge intensity-discontinuities in a video image of a highly textured object, such as a highly textured field of blades of grass, are typically invisible to the HVS. It is more important to attenuate blocks in large open smooth-intensity regions than in regions of high spatial detail. This method exploits this characteristic of the HVS.
However, if the above wall is occluded from view except in small isolated regions, the HVS is again relatively unaware of the block artifacts. That is, the HVS is less sensitive to these blocks because, although located in regions of smooth-intensity, these regions are not sufficiently large. This method exploits this characteristic of the HVS.
As a result of applying this method to an image frame, the image is separated into at least two regions: the Deblock region and the remaining Detail region. The method can be applied in a hierarchy so that the above first-identified Detail region is then itself separated into a second Deblock region and a second Detail region, and so on recursively.
As described in the following, criteria are employed to ensure that the Deblock region is the region in which the HVS is most aware of and sensitive to block artifacts and is therefore the region that is to be de-blocked. The Detail region is then the region in which the HVS is not particularly sensitive to block artifacts. In this method, Deblocking of the Deblock region may be achieved by spatial intensity-smoothing. The process of spatial intensity-smoothing may be achieved by low pass filtering or by other means. Intensity-smoothing significantly attenuates the so-called high spatial frequencies of the region to be smoothed and thereby significantly attenuates the edge-discontinuities of intensity that are associated with the edges of block artifacts.
One embodiment of this method employs spatially-invariant low pass filters to spatially-smooth the identified Deblock Region. Such filters may be Infinite Impulse Response (IIR) filters or Finite Impulse Response (FIR) filters or a combination of such filters. These filters are typically low pass filters and are employed to attenuate the so-called high spatial frequencies of the Deblock region, thereby smoothing the intensities and attenuating the appearance of block artifacts.
The above definitions of the Deblock region DEB and the Detail region DET do not preclude further signal processing of either or both regions. In particular, using this method, the DET region could be subjected to further separation into new regions DET1 and DEB1 where DEB1 is a second region for Deblocking (DEB1εDET), possibly using a different Deblocking method or different filter than is used to Deblock DEB. DEB1 and DET1 are clearly sub-regions of DET.
Identifying the Deblock region (DEB) often requires an identifying algorithm that has the capability to run video in real-time. For such applications, high levels of computational complexity (e.g., identifying algorithms that employ large numbers of multiply-accumulate operations (MACs) per second) tend to be less desirable than identifying algorithms that employ relatively few MACs/s and simple logic statements that operate on integers. Embodiments of this method use relatively few MACs/s. Similarly, embodiments of this method ensure that the swapping of large amounts of data into and out of off-chip memory is minimized. In one embodiment of this method, the identifying algorithm for determining the region DEB (and thereby the region DET) exploits the fact that most visibly-objectionable blocks in heavily compressed video clips have almost-constant intensity throughout their interiors.
In one embodiment of this method, the identification of the Deblock region DEB commences by choosing Candidate Regions Ci in the frame. In one embodiment, these regions Ci are as small as one pixel in spatial size. Other embodiments may use candidate regions Ci that are larger than one pixel in size. Each Candidate region Ci is tested against its surrounding neighborhood region by means of a set of criteria that, if met, cause Ci to be classified as belonging to the Deblock region DEB of the image frame. If Ci does not belong to the Deblock Region, it is set to belong to the Detail region DET. Note, this does not imply that the collection of all Ci is equal to DEB, only that they form a sub-set of DEB.
In one embodiment of this method, the set of criteria used to determine whether Ci belongs to the Deblock region DEB may be categorized as follows:
a. Flatness-of-Intensity Criteria (F),
b. Discontinuity Criteria (D) and
c. Look-Ahead/Look-Behind Criteria (L).
If the above criteria (or any useful combination thereof) are satisfied, the Candidate Regions Ci are assigned to the Deblock region (i.e., CiεDEB). If not, then the Candidate Region Ci is assigned to the Detail Region DET(CiεDET). In a particular implementation, such as when Deblocking a particular video clip, all three types of criteria (F, D and L) may not be necessary. Further, these criteria may be adapted on the basis of the local properties of the image frame. Such local properties might be statistical or they might be encoder/decoder-related properties, such as the quantization parameters or motion parameters used as part of the compression and decompression processes.
In one embodiment of this method, the Candidate Regions Ci are chosen, for reasons of computational efficiency, such that they are sparsely-distributed in the image frame. This has the effect of significantly reducing the number of Candidate Regions Ci in each frame, thereby reducing the algorithmic complexity and increasing the throughput (i.e., speed) of the algorithm.
In this illustrative example, applying the Deblocking criteria to
In one embodiment of this method, the entire Deblock region DEB is ‘grown’ from the abovementioned sparsely-distributed Candidate Regions CiεDEB into surrounding regions.
The identification of the Deblock region in
The above growing process spatially connects the sparsely-distributed CiεDEB to form the entire Deblock region DEB.
In one embodiment of this method, the above growing process is performed on the basis of a suitable distance metric that is the horizontal or vertical distances of a pixel from the nearest Candidate region pixel Ci. For example, with Candidate region pixels Ci chosen at 7 pixels apart in the vertical and horizontal directions, the resultant Deblock region is as shown in
As one enhancement, the growing process is applied to the Detail region DET in order to extend the Detail region DET into the previously determined Deblock region DEB. This can be used to prevent the crossed-mask of spatially invariant low-pass smoothing filters from protruding into the original Detail region and thereby avoid the possible creation of undesirable ‘halo’ effects. In doing so, the Detailed region may contain in its expanded boundaries unattenuated blocks, or portions thereof. This is not a practical problem because of the relative insensitivity of the HVS to such block artifacts that are proximate to Detailed Regions.
Alternate distance metrics may be employed. For example, a metric corresponding to all regions of the image frame within circles of a given radius centered on the Candidate Regions Ci may be employed.
The Deblock Region, that is obtained by the above or other growing processes has the property that it encompasses (i.e. spatially covers) the part of the image frame that is to be Deblocked.
Formalizing the above growing process, the entire Deblock region DEB (or the entire Detail region DET) can be determined by surrounding each Candidate Region Ci (that meets the criteria CiεDEB or CiεDET) by a Surrounding Grown region Gi whereupon the entire Deblock region DEB (or the entire Detail region DET) is the union of all Ci and all Gi.
Equivalently, the entire Deblock region can be written logically as
where ∪ is the union of the regions and where again DET is simply the remaining parts of the image frame. Alternatively, the entire Detail region DET may be determined from the qualifying Candidate Regions (using Ci∉DEB) according to
If the Grown Surrounding Regions Gi (32-1 to 32-N in
One embodiment of this method is illustrated in
The specific identification algorithms used for these eight flatness criteria can be among those known to one of ordinary skill in the art. The eight flatness criteria are satisfied by writing the logical notations axεF, bxεF, . . . , dyεF. If met, the corresponding region is ‘sufficiently-flat’ according to whatever flatness-of-intensity criterion has been employed.
The following example logical condition may be used to determine whether the overall flatness criterion for each Candidate Pixel x(r,c) is satisfied:
if
(axεF and bxεF) or (cxεF and dxεF) (1)
and
(ayεF and byεF)or (cyεF and dyεF) (2)
then
CiεFlat.
Equivalently, the above Boolean statement results in the truth of the statement CiεFlat under at least one of the following three conditions:
a) Crossed-mask 52 lies over a 9-pixel region that is entirely of sufficiently-flat intensity, therefore including sufficiently-flat regions where 52 lies entirely in the interior of a block
OR
b) Crossed-mask 52 lies over a discontinuity at one of the four locations (r+1,c) OR (r+2,c) OR (r−1,c) OR (r−2,c)
while satisfying the flatness criteria at the remaining three locations
OR
c) Crossed-mask 52 lies over a discontinuity at one of the four locations (r,c+1) OR (r,c+2) OR (r,c−1) OR (r,c−2)
while satisfying the flatness criteria at the remaining three locations.
In the above-described process, as required for identifying Candidate pixels, crossed-mask 52 spatially covers the discontinuous boundaries of blocks, or parts of blocks, regardless of their locations, while maintaining the truth of the statement CiεFlat. A more detailed explanation of the above logic is as follows. Condition
a) is true when all the bracketed statements in (1) and (2) are true. Suppose there exists a discontinuity at one of the locations given in b). Then statement (2) is true because one of the bracketed statements is true. Suppose there exists a discontinuity at one of the locations given in c). Then statement (1) is true because one of the bracketed statements is true.
Using the above Boolean logic, the flatness criterion is met when the crossed-mask 52 straddles the discontinuities that delineate the boundaries of a block, or part of a block, regardless of its location.
The employment of a specific algorithm for determining the Flatness Criteria F (that are applied to the Candidate Pixels Ci) is not crucial to the method. However, to achieve high throughput capability, one example algorithm employs a simple mathematical flatness criterion for ax, bx, cx, dx, ay, by, cy and dy that is, in words, ‘the magnitude of the first-forward difference of the intensities between the horizontally adjacent and the vertically adjacent pixels’. The first-forward difference in the vertical direction, for example, of a 2D sequence x(r, c) is simply x(r+1, c)−x(r, c).
The above-discussed flatness criteria are sometimes insufficient to properly identify the region DEB in every region of every frame for every video signal. Assume now that the above flatness condition CiεFlat is met for the Candidate Pixel at Ci. Then, in this method, a Magnitude-Discontinuity Criterion D may be employed to improve the discrimination between a discontinuity that is part of a boundary artifact of a block and a non-artifact discontinuity that belongs to desired detail that exists in the original image, before and after its compression.
The Magnitude-Discontinuity Criterion method sets a simple threshold D below which the discontinuity is assumed to be an artifact of blocking. Writing the pixel x(r, c) (61) at Ci in terms of its intensity x, the Magnitude Discontinuity Criterion is of the form
dx<D
where dx is the magnitude of the discontinuity of intensity at the center (r, c) of crossed-mask 52.
The required value of D can be inferred from the intra-frame quantization step size of the compression algorithm, which in turn can either be obtained from the decoder and encoder or estimated from the known compressed file size. In this way, transitions in the original image that are equal to or larger than D are not mistaken for the boundaries of blocking artifacts and thereby wrongly Deblocked. Combining this condition with the flatness condition gives the more stringent condition
Values for D ranging from 10% to 20% of the intensity range of x(r, c) have been found to yield satisfactory attenuation of block artifacts over a wide range of different types of video scenes.
CiεFlat and dx<D
There will almost certainly exist non-artifact discontinuities (that should therefore not be deblocked) because they were in the original uncompressed image frame. Such non-artifact discontinuities may satisfy dx<D and may also reside where the surrounding region causes CiεFlat, according to the above criterion, which thereby leads to such discontinuities meeting the above criterion and thereby being wrongly classified for deblocking and therefore wrongly smoothed. However, such non-artifact discontinuities correspond to image details that are highly localized. Experiments have verified that such false deblocking is typically not objectionable to the HVS. However, to significantly reduce the probability of such rare instances of false deblocking, the following Look-Ahead (LA) and Look-Behind (LB) embodiment of the method may be employed.
It has been found experimentally that, in particular video image frames, there may exist a set of special numerical conditions under which the required original detail in the original video frame meets both of the above local flatness and local discontinuity conditions and would therefore be falsely identified (i.e., subjected to false deblocking and false smoothing). Equivalently, a small proportion of the Ci could be wrongly assigned to DEB instead of to DET. As an example of this, a vertically-oriented transition of intensity at the edge of an object (in the uncompressed original image frame) can meet both the flatness conditions and the discontinuity conditions for Deblocking. This can sometimes lead to visibly-objectionable artifacts in the displayed corresponding real-time video signal.
The following LA and LB criteria are optional and address the above special numerical conditions. They do so by measuring the change in intensity of the image from crossed-mask 52 to locations suitably located outside of crossed-mask 52.
If the above criteria CiεFlat and dx<D are met and also exceed a ‘looking ahead LA’ threshold criterion or a ‘looking back LB’ threshold criterion L, then the candidate Ci pixel is not assigned to the Deblock Region. In terms of the magnitudes of derivatives, one embodiment of the LA and LB criteria is:
if
then
In the above, terms such as (dxA≧L) simply mean that the magnitude of the LA magnitude-gradient or change criterion dx as measured from the location (r,c) out to the location of pixel A in this case is greater than or equal to the threshold number L. The other three terms have similar meanings but with respect to pixels at locations B, C and D.
The effect of the above LA and LB criteria is to ensure that deblocking cannot occur within a certain distance of an intensity-magnitude change of L or greater.
These LA and LB constraints have the desired effect of reducing the probability of false deblocking. The LA and LB constraints are also sufficient to prevent undesirable deblocking in regions that are in the close neighborhoods of where the magnitude of the intensity gradient is high, regardless of the flatness and discontinuity criteria.
An embodiment of the combined criteria, obtained by combining the above three sets of criteria, for assigning a pixel at Ci to the Deblock region DEB, can be expressed as an example criterion as follows:
if
then
As an embodiment of this method, the truth of the above may be determined in hardware using fast logical operations on short integers. Evaluation of the above criteria over many videos of different types has verified its robustness in properly identifying the Deblock Regions DEB (and thereby the complementary Detail Regions DET).
Many previously-processed videos have ‘spread-out’ block edge-discontinuities. While being visibly-objectionable, spread-out block edge-discontinuities straddle more than one pixel in the vertical and/or horizontal directions. This can cause incorrect classification of block edge-discontinuities to the Deblock Region, as described by example in the following.
For example, consider a horizontal 1-pixel-wide discontinuity of magnitude 40 that separates flat-intensity regions that satisfy CiεFlat, occurring from say x(r, c)=100 to x(r, c+1)=140 with the criterion discontinuity threshold D=30. The discontinuity is of magnitude 40 and this exceeds D, implying that the pixel x(r,c) does not belong to the Deblock region DEB. Consider how this same discontinuity of magnitude 40 is classified if it is a spread-out discontinuity from say x(r, c)=100 to x(r, c+1)=120 to x(r, c+2)=140. In this case, the discontinuities at (r,c) and x(r,c+1) are each of magnitude 20 and because they fail to exceed the value of D, this causes false Deblocking to occur: that is, both x(r,c) and x(r,c+1) would be wrongly assigned to the Deblock region DEB.
Similar spread-out edge discontinuities may exist in the vertical direction.
Most commonly, such spread-out discontinuities straddle 2 pixels although the straddling of 3 pixels is also found in some heavily-compressed video signals.
One embodiment of this method for correctly classifying spread-out edge-discontinuities is to employ a dilated version of the above 9-pixel crossed-mask 52 which may be used to identify and thereby Deblock spread-out discontinuity boundaries. For example, all of the Candidate Regions identified in the 9-pixel crossed-mask 52 of
d) Crossed-mask 52 (M) lies over a 20-pixel region that is entirely of sufficiently-flat intensity, therefore including sufficiently-flat regions where M lies entirely in the interior of a block
OR
e) Crossed-mask 52 lies over a 2-pixel wide discontinuity at one of the four 1×2 pixel locations
(r+2:r+3,c) OR (r+4:r+5,c) OR (r−2:r−1,c) OR (r−4:r−3,c)
while satisfying the flatness criteria at the remaining three locations
OR
f) Crossed-mask 52 lies over a 2-pixel wide discontinuity at one of the four 2×1 pixel locations
(r,c+2:c+3) OR (r,c+4:c+5) OR (r,c−2:c−1) OR (r,c−4:c−3)
while satisfying the flatness criteria at the remaining three locations.
In this way, as required, the crossed-mask M is capable of covering the 1-pixel-wide boundaries as well as the spread-out 2-pixel-wide boundaries of blocks, regardless of their locations, while maintaining the truth of the statement CiεFlat. The minimum number of computations required for the 20-pixel crossed-mask is the same as for the 9-pixel version.
There are many variations in the details by which the above flatness and discontinuity criteria may be determined. For example, criteria for ‘flatness’ could involve such statistical measures as variance, mean and standard deviation as well as the removal of outlier values, typically at additional computational cost and slower throughput. Similarly, qualifying discontinuities could involve fractional changes of intensity, rather than absolute changes, and crossed-masks M can be dilated to allow the discontinuities to spread over several pixels in both directions.
A particular variation of the above criteria relates to fractional changes of intensity rather than absolute changes. This is important because it is well known that the HVS responds in an approximately linear way to fractional changes of intensity. There are a number of modifications of the above method for adapting to fractional changes and thereby improving the perception of Deblocking, especially in dark regions of the image frame. They include:
i. Instead of subjecting the image intensity x(r,c) directly to the flatness and discontinuity criteria as the Candidate Pixel Ci, the logarithm of intensity Ci=logb(x(r,c)) is used throughout, where the base b might be 10 or the natural exponent e=2.718 . . . .
OR
ii. Instead of employing magnitudes of intensity differences directly, fractional differences are used directly as all or part of the criteria for flatness, discontinuities, look ahead and look back. For example, the flatness criteria may be modified from the absolute intensity threshold e in
|x(r+1,c)−x(r,c)|<e
to a threshold containing a relative intensity term, such as a relative threshold eR of the form
where, in the example in the Appendix, we have used e=3 and IMAX=255 which is the maximum intensity that can be assumed by x(r,c).
The Candidate Regions Ci must sample the 2D space of the image frame sufficiently-densely that the boundaries of most of the block artifacts are not missed due to under-sampling. Given that block-based compression algorithms ensure that most boundaries of most blocks are separated by at least 4 pixels in both directions, it is possible with this method to sub-sample the image space at intervals of 4 pixels in each direction without missing almost all block boundary discontinuities. Up to 8 pixels in each direction has also been found to work well in practice. This significantly reduces computational overhead. For example sub-sampling by 4 in each direction leads to a disconnected set of points that belong to the Deblock Region. An embodiment of this method employs such sub-sampling.
Suppose the Candidate Pixels are L pixels apart in both directions. Then the Deblock region may be defined, from the sparsely-distributed Candidate Pixels, as that region obtained by surrounding all Candidate Pixels by L×L squares blocks. This is easy to implement with an efficient algorithm.
Once the Deblock Regions are identified, there is a wide variety of Deblocking strategies that can be applied to the Deblock region in order to attenuate the visibly-objectionable perception of blockiness. One method is to apply a smoothing operation to the Deblock Region, for example by using Spatially-Invariant Low Pass IIR Filters or Spatially-Invariant Low Pass FIR Filters or FFT-based Low Pass Filters.
An embodiment of this method down samples the original image frames prior to the smoothing operation, followed by up sampling to the original resolution after smoothing. This embodiment achieves faster overall smoothing because the smoothing operation takes place over a smaller number of pixels.
With the exception of certain filters such as the Recursive Moving Average (i.e. the Box) 2D filter, 2D FIR filters have computational complexity that increases with the level of smoothing that they are required to perform. Such FIR smoothing filters require a number of MACs/s that is approximately proportional to the level of smoothing.
Highly-compressed videos (e.g. having a quantization parameter q>40) typically require FIR filters of order greater than 11 to achieve sufficient smoothing effects, corresponding to at least 11 additions and up to 10 multiplications per pixel. A similar level of smoothing can be achieved with much lower order IIR filters, typically of order 2. One embodiment of this method employs IIR filters for smoothing the Deblock Region.
Another method for smoothing is similar to that described above except that the smoothing filters are spatially-varied (i.e., spatially-adapted) in such a way that the crossed-mask of the filters is altered, as a function of spatial location, so as not to overlap the Detail Region. In this method, the order (and therefore the crossed-mask size) of the filter is adaptively reduced as it approaches the boundary of the Detail Region.
The crossed-mask size may also be adapted on the basis of local statistics to achieve a required level of smoothing, albeit at increased computational cost. This method employs spatially-variant levels of smoothing in such a way that the response of the filters cannot overwrite (and thereby distort) the Detail region or penetrate across small Detail Regions to produce an undesirable ‘halo’ effect around the edges of the Detail Region.
A further improvement of this method applies a ‘growing’ process to the Detail region DET in a) above for all Key Frames such that DET is expanded around its boundaries. The method used for growing, to expand the boundaries, such as that described herein may be used, or other methods known to one of ordinary skill in the art. The resultant Expanded Detail region EXPDET is used in this further improvement as the Detail region for the adjacent image frames where it overwrites the Canvas Images CAN of those frames. This increases throughput and reduces computational complexity because it is only necessary to identify the Detail region DET (and its expansion EXPDET) in the Key Frames. The advantage of using EXPDET instead of DET is that EXPDET more effectively covers moving objects having high speeds than can be covered by DET. This allows the Key Frames to be spaced farther apart, for a given video signal, and thereby improves throughput and reduces complexity.
In this method, the Detailed region DET may be expanded at its boundaries to spatially cover and thereby make invisible any ‘halo’ effect that is produced by the smoothing operation used to Deblock the Deblock region.
In an embodiment of this method, a spatially-variant 2D Recursive Moving Average Filter (i.e. a so-called 2D Box Filter) is employed, having the 2D Z transform transfer functions
which facilitates fast recursive 2D FIR filtering of 2D order (L1, L2). The corresponding 2D recursive FIR input-output difference equation is
where y is the output and x is the input. This embodiment has the advantage that the arithmetic complexity is low and is independent of the level of smoothing.
In a specific example of the method, the order parameters (L1, L2) are spatially-varied (i.e., spatiality of the above 2D FIR Moving Average filter is adapted to avoid overlap of the response of the smoothing filters with the Detail region DET.
Process 704 then can begin smoothing such that process 705 determines when the boundary of the Nth Deblock region has been reached and process 706 determines when smoothing of the Nth region has been completed. Process 708 indexes the regions by adding 1 to the value N and processes 704 through 707 continue until process 707 determines that all Deblock regions have been smoothed. Then process 709 combines the smoothed Deblock regions to the respective Detail regions to arrive at an improved image frame. Note that it is not necessary to wait until all of the Deblock regions are smoothed before beginning the combining process since these operations can be performed in parallel if desired.
If more than one video stream is delivered to decoder 84 then the various channels of the digital stream can be selected by tuner 84-2 for decoding according to the processes discussed herein. Processor 84-1 controls the decoding and the output decode video stream can be stored in storage 85 or displayed by one or more displays 86 or, if desired, distributed (not shown) to other locations. Note that the various video channels can be sent from a single location, such as from encoder 82, or from different locations, not shown. Transmission from the decoder to the encoder can be performed in any well-known manner using wireline or wireless transmission while conserving bandwidth on the transmission medium.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
This application is related to concurrently filed, co-pending, commonly owned patent applications SYSTEMS AND METHODS FOR IMPROVING THE QUALITY OF COMPRESSED VIDEO SIGNALS BY SMOOTHING THE ENTIRE FRAME AND OVERLAYING PRESERVED DETAIL, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P011US/10808778; and SYSTEMS AND METHODS FOR HIGHLY EFFICIENT VIDEO COMPRESSION USING SELECTIVE RETENTION OF RELEVANT VISUAL DETAILS, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P012US/10808779, which applications are hereby incorporated by reference herein.