The present disclosure relates in general to image processing. In particular, this disclosure relates to surround area detection and blending for image filtering
Generally, images may include non-active areas surrounding the main subject of such images. Such non-active areas may be of arbitrary shapes (e.g. rectangular, circular, oval or any other geometrical shapes) or any arbitrary irregular shapes. Moreover, non-active areas in images may contain text, labels, or caption. It is highly desired to detect and blend the surround area to avoid the filtering artifact due to surround area in image filtering. In the existing filtering operation, surround areas are assumed to be noise-free rectangles and excluded during image filtering. This may work well for the noise-free in-house studio contents. However, for other video contents which have logo, text or noise in the surround area or non-rectangular surround areas, the existing basic method may create banding/halo artifact near the surround area. Such filtering artifact could be amplified and degrade the visual quality in further operations, such as local reshaping or other operations based on the filtering output.
The term “surround area” used herein refers to non-active (static) regions around an image or video frame (typically referred to as the “active area”). Examples of surround area include the black bands known as letterbox or pillarbox to accommodate a variety of video/film aspect ratios within a typical 16:9 television frame. Surround areas can have an arbitrary shape such as rectangle, circle, ellipse or any other irregular shape. Surround areas are typically distinguished by their “monotonic” color, say black or gray; however, in many cases, text (e.g., subtitles) or graphics (e.g., logos) may be overlaid over these areas.
The disclosed methods and devices provide an efficient framework to detect and blend the surround area to avoid the filtering artifact due to surround area in image filtering.
Compared to existing methods that can only handle rectangular areas, such as letterboxes or pillar-boxes, the described method can be applied to arbitrary images with padded dark, monochromatic, colored, or white areas of arbitrary shape, such as letterboxes, pillar-boxes, ovals, or any other shapes. Moreover, the disclosed methods are also applicable to surround areas that contain text, logos and close captions. As will be described in more detail, the disclose method excludes such texts, logos and close captions from the surround area.
The described methods detect the surround areas in the image with possible compression artifact and noise, and then perform blending to minimize the effects of surround areas for any image filtering operations.
An embodiment of the present invention is method for detecting a surround area in an image, the method comprising: calculating a histogram of a boundary area of the image; finding a peak and a width of the histogram; based on the peak and the width of the histogram, classifying a presence of the surround area in the image, thereby generating a peak detection score; based on a ratio of the pixels that belong to the peak on a minimum possible surround areas at the boundary area of the image, classifying the presence of the surround area in the image, thereby generating a boundary detection score; generating a total score based on a combination of the peak and the boundary detection scores, and detecting the presence of the surround area based on the total score.
A method may be computer-implemented in some embodiments. For example, the method may be implemented, at least in part, via a control system comprising one or more processors and one or more non-transitory storage media.
Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g. software) stored on one or more non-transitory media. Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. Accordingly, various innovative aspects of the subject matter described in this disclosure may be implemented in a non-transitory medium having software stored thereon. The software may, for example, be executable by one or more components of a control system such as those disclosed herein. The software may, for example, include instructions for performing one or more of the methods disclosed herein.
At least some aspects of the present disclosure may be implemented via an apparatus or apparatuses. For example, one or more devices may be configured for performing, at least in part, the methods disclosed herein. In some implementations, an apparatus may include an interface system and a control system. The interface system may include one or more network interfaces, one or more interfaces between the control system and memory system, one or more interfaces between the control system and another device and/or one or more external device interfaces. The control system may include at least one of a general-purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. Accordingly, in some implementations the control system may include one or more processors and one or more non-transitory storage media operatively coupled to one or more processors.
Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the following figures may not be drawn to scale. Like reference numbers and designations in the various drawings generally indicate like elements, but different reference numbers do not necessarily designate different elements between different drawings.
As shown in the exemplary embodiment of
In step (10′), in order to avoid filtering artifact near the surround area, based on a combination of input image (11) and generated surround area probability map (15), the surround area is blended as shown in step (16). In accordance with an embodiment of the present disclosure, filtering operation (17) can be any sort of image filtering (e.g. edge-preserving filtering as described, for example, in U.S. Prov. App. Ser. 63/086,699 filed by the applicant of the present disclosure, for “Adaptive Local Reshaping For SDR-To-HDR Up-Conversion” filed on Oct. 2, 2020 and incorporated herein by reference in its entirety). After filtering (17), the previously blended surround area in the initial filtered image is compensated, as shown in step (18), so that it is consistent to the original input image (11). Finally, the compensated filtered image is sent to further operation (19), such as local reshaping or other operations based on the filtering output. In what follows exemplary embodiments of surround area detection (10) and blending (10′) are described in more detail.
A. Surround Area Detection
A.1. Surround Area Histogram
According to an embodiment of the present disclosure, two assumptions about the surround area may be made: 1) the surround area in an image has constant pixel value with some noise, and 2) the surround area is “stuck” to the boundary of the image. Based on these two assumptions, it can be expected that if a surround area exists, there will be a peak in the histogram of the boundary region of the image.
In what follows, the input image and its Y-channel are denoted as S and SY, respectively. For illustration purpose,
With continued reference to
The following table shows an example of how the histogram of boundary region of the image is calculated in accordance with the teachings of the present disclosure.
Upon calculating the histogram of the boundary region, the highest peak and its width in the histogram are calculated as the potential surround area pixel value. The peak is defined as the leftmost contiguous bins with the maximum value, and the width is defined by the bins on the two sides of the peak where the histogram value drops to rpeak ratio of maximum. As an example, rpeak may be set as rpeak=exp(−0.5)≈0.6065, so that when the peak is close to Gaussian distribution, half of the width of the peak is close to the standard deviation.
A.2 Surround Area Classifier
In what follows, the term “classifier” refers to the module that takes the features of the histogram to predict a score indicating whether surround areas exist or not. A score bigger than zero means there are surround areas, and a score equal to or smaller than zero means there are no surround areas.
In view of the what has been described in the previous section, a potential peak in the histogram may have been found and the next step may be that of indicating if such peak is due to surround area. In order to perform such step, two classifiers according to the teachings of the present disclosure may be defined: 1) peak classifier 2) boundary classifier. As will be described later, the output results from such classifiers may be combined for an improved accuracy.
A.2.1 Peak Classifier
In an embodiment, the peak classifier predicts the surround area based on the width of the peak. Ideally, in the noise-free scenario, the width of the peak should be 0, i.e. bSR=bSL. However, in practical conditions, there might be compression artifact or other noise in the image. The noise may be considered to act, for example, like an additive white Gaussian noise with standard deviation σnoise. In this case, if there is a surrounding area in the image, half of the width of peak should be close to σnoise. As such, given the observed noise (half of the width of the peak) Δbpeak=(bSR−bSL)/2, the detection score of peak property may be defined, for example, using piecewise linear function as shown in eq. 1 below. The larger the score speak, the more likely there is a surround area.
With reference to eq. 1, θpeak is the decision threshold for peak property. θpeak,pos and θpeak,neg are decision thresholds with additional margin on the positive and negative sides for peak property, respectively. Moreover, θpeak,pos<θpeak<θpeak,neg. From Equation 1 it can be noticed that when Δbpeak<θpeak, the score gradually increases and reaches 1 at Δbpeak=θpeak,pos. On the other hand, when Δbpeak≥θpeak, the score gradually decreases and reaches −1 at Δbpeak=θpeak,neg.
A2.2. Boundary Classifier
The boundary classifier's function is to predict the surround area based on the boundary property. The more pixels on the image boundary belong to the peak, the more likely the peak is from a surround area. Therefore, with predefined minimum width and height of surround area WL,min and HL,min, the ratio of the pixels that belong to the peak on the minimum possible surround areas at the four boundaries (top, bottom, left, and right) of the image can be found. In an embodiment, the maximum ratio from the four boundaries are taken into account as the surround area can be at either boundary. Empirically, the minimum width and height of surround area may be set as, for example, WL,min=round (0.01×W) and HL,min=round (0.01×H). The following table is an example of how the ratio of pixels rboundary that belong to the peak on the minimum possible surround area is calculated.
With reference to the table above, the higher the ratio rboundary, the more likely the peak is from a surround area. Therefore, the detection score of the boundary property may be defined, for example, using a piecewise linear function, shown in eq. 2 below. The larger the score, the more likely there is a surround area
In eq. 2, θboundary is the decision threshold for boundary property. θboundary,pos and θboundary,neg are decision thresholds with additional margin on the positive and negative sides for boundary property, respectively. Moreover, θboundary,pos>θboundary>θboundary,neg. From Equation (2) it can be noticed that when rboundary≥θboundary, the score gradually increases and reaches 1 at rboundary=θboundary,pos. On the other hand, when rboundary<θboundary, the score gradually decreases and reaches −1 at rboundary=θboundary,neg.
A2.3 Total Detection Score
In order to calculate the total detection score, the two weak classifiers in previous sections are combined to obtain a more accurate classifier with soft classification margin. The surround area may satisfy the criteria in both above-disclosed classifiers. In an embodiment, the minimum score as the total detection score is taken into account for total score calculation, as shown in eq. 3 below.
stotal=clip3(min(speak,sboundary),−1,1) (3)
This means that when both classifiers predict high scores, a surround area is declared to be present. The larger the score, the more likely there is a surround area, and vice versa.
A3. Surround Area Modeling
With reference to
The standard deviation for left and right sides, shown in eqs. 5a, 5b respectively, are defined as the center to the width of the peak. The minimum may be clipped to 0.5 for numerical stability
σbL=max(μb−bSL,0.5) (5a)
σbR=max(bSR−μb,0.5) (5b)
As shown in eq. 6 below, the probability is modeled using the scaled piecewise Gaussian kernel:
The probability as calculated in eq. 6 above is back-projected to obtain surround area probability map. The following table shows an example of how the process of back-projection for surround area probability map is performed.
In what follows some exemplary results showing the performance of the disclosed methods in the case of rectangular and oval shaped surround areas and also in the case of surround area including text are presented.
According to embodiments of the present disclosure, if the total detection score stotal≤ 0, it is concluded that there is no surround area and the surround area probability map ML may be filled with 0's.
B. Surround Area Blending
In view of the above-disclosed estimation of surround area probability map, the filtering artifact near surround can be reduced by avoiding the surround area in image filtering operations, such as the edge-preserving filter described in U.S. Prov. App. Ser. 63/086,699, incorporated herein by reference in its entirety. The filtering artifact is mainly due to the inconsistent trend in pixel values between the image content area and the surround area. An efficient way to reduce the filtering artifact in accordance with the teachings of the present disclosure is to blend the surround area with the nearby image content. In this case, there will be little inconsistency between the image content area and the surround area, and thus the filtering artifact is reduced. In another perspective, this approach is similar to inpainting the missing image content in the surround area. If the image filter can see the missing image content during filtering, the filtering artifact will be minimal. Although there exist some common methods to handle the filtering boundary, such as repeating boundary pixels, mirroring, or periodic padding, they are for rectangle image contents and cannot handle surround area boundaries with arbitrary shapes.
B1. Weighted Gaussian Filtering
As mentioned previously, an exemplary method of blending is to use weighted Gaussian filtering.
Q=1−ML (7),
Empirically, σblend may be selected, for example, to be the maximum filtering kernel size that will be used in following filtering operation for local reshaping. In addition, to avoid division by 0, {tilde over (S)}Y,(w) may be set equal to {tilde over (S)}Y at the pixel locations where (Q) is 0. Alternatively, the minimum value in Q may be clipped to a very small number (e.g. 10−6) so that (Q) is always positive. The implementation detail of the approximated Gaussian filter can be found, for example, in the above-mentioned U.S. Prov. App. Ser. 63/086,699.
With further reference to
{tilde over (S)}Y,(b)={tilde over (S)}Y.*Q+{tilde over (S)}Y,(w)*(1−Q) (9)
As a result, the image content will be preserved in the blended image (1206).
B.2 Mirroring with Distance Transform
As mentioned previously, another exemplary method of surround area blending is using mirroring with distance transform. The flow chart illustrating such method is shown in
MF=close,se(open,se(1(ML<θM))) (10)
The threshold θM may be set, for example, as θM=exp(−0.5)≈0.6065. The function 1(x)=1 if condition x is true, and 0 if condition x is false. Operators open,se and close,se are morphological opening and closing, respectively, with structuring element se. We use rectangle structuring element of size, for example, 5×5 may be used for se. The distance transform (1304) of the binary image content mask {tilde over (M)}F may then be calculated. The distance transform (1304) finds the distance from each pixel to the nearest nonzero pixel in a binary image, see also reference [2], incorporated herein by reference in its entirety. The distance metric used may be, for example, the Li distance (also called city block distance or Manhattan distance).
Continuing with flowchart of
{tilde over (S)}Y,(b)(i,j)={tilde over (S)}Y(2Iy(i,j)−i,2Ix(i,j)−j) (11)
Moreover, if the pixel location after point reflection is out of image area, the nearest pixel on the image boundary is chosen instead. The table below is an example of how mirroring with distance transform is performed:
In the case where {tilde over (M)}L is all 0, the distance transform is not calculated and the blended image is simply is set as {tilde over (S)}Y,(b)={tilde over (S)}Y.
Surround Area Compensation
With reference to eq. 11, in an embodiment, the blended image {tilde over (S)}Y,(b) may be sent to an image filtering operation to avoid the filtering artifact and get the initial filtered image. This is shown in eq. 12 shown below:
{tilde over (S)}Y,(i)=IF({tilde over (S)}Y,(b)) (12)
The operator IF(⋅) can be any image filtering operations, such as the edge-preserving filter in reference [1]. {tilde over (S)}Y,(l) represents the result after filtering operations which will be referred to in what follows as the initial filtered image. As the blended (pre-processed) image contains extra image content that does not exist in the original input image in the surround area and so does the initial filtered image, we need to compensate (post-process) it before we send it to further operation, such as local reshaping [1].
{tilde over (S)}Y,(cl)={tilde over (S)}Y,(l)*(1−ML)+{tilde over (S)}Y.*ML (13)
For the pixels in surround area, ML is close to 1, and {tilde over (S)}Y,(cl) will be close to {tilde over (S)}Y. In other words, the previously blended surround area will be compensated and become the same as the original input image and is ready for further operation, such as local reshaping.
In the above-disclosed embodiments, the histogram of the Y channel is used for description purpose, and for surround areas that could be distinguished by luma. In some embodiments (e.g. the cases where luma is not enough), histograms of U and V channels may be further used for classification, in the same way as what was disclosed based on the histogram of the Y channel. In addition, the classification results from Y, U and V channels may be combined to obtain an improved result. As an example, the union of the surround areas detected by Y, U and V may be considered as the overall detected surround area.
A number of embodiments of the disclosure have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the present disclosure. Accordingly, other embodiments are within the scope of the following claims.
The present disclosure is directed to certain implementations for the purposes of describing some innovative aspects described herein, as well as examples of contexts in which these innovative aspects may be implemented. However, the teachings herein can be applied in various different ways. Moreover, the described embodiments may be implemented in a variety of hardware, software, firmware, etc. For example, aspects of the present application may be embodied, at least in part, in an apparatus, a system that includes more than one device, a method, a computer program product, etc. Accordingly, aspects of the present application may take the form of a hardware embodiment, a software embodiment (including firmware, resident software, microcodes, etc.) and/or an embodiment combining both software and hardware aspects. Such embodiments may be referred to herein as a “circuit,” a “module”, a “device”, an “apparatus” or “engine.” Some aspects of the present application may take the form of a computer program product embodied in one or more non-transitory media having computer readable program code embodied thereon. Such non-transitory media may, for example, include a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. Accordingly, the teachings of this disclosure are not intended to be limited to the implementations shown in the figures and/or described herein, but instead have wide applicability.
The examples set forth above are provided to those of ordinary skill in the art as a complete disclosure and description of how to make and use the embodiments of the disclosure, and are not intended to limit the scope of what the inventor/inventors regard as their disclosure.
Modifications of the above-described modes for carrying out the methods and systems herein disclosed that are obvious to persons of skill in the art are intended to be within the scope of the following claims. All patents and publications mentioned in the specification are indicative of the levels of skill of those skilled in the art to which the disclosure pertains. All references cited in this disclosure are incorporated by reference to the same extent as if each reference had been incorporated by reference in its entirety individually.
It is to be understood that the disclosure is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. The term “plurality” includes two or more referents unless the content clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains.
Number | Date | Country | Kind |
---|---|---|---|
21178935 | Jun 2021 | EP | regional |
This application is a U.S. National Stage application under U.S.C. 371 of International Application No. PCT/US2022/031795, filed on Jun. 1, 2022, which claims the priority benefit of U.S. Provisional Application No. 63/209,602, filed Jun. 11, 2021 and EP Application No. 21178935.9, filed Jun. 11, 2021, each of which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/031795 | 6/1/2022 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/260907 | 12/15/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5351135 | Saeger | Sep 1994 | A |
6714594 | Dimitrova | Mar 2004 | B2 |
6947097 | Joanblanq | Sep 2005 | B1 |
7538821 | Ahn | May 2009 | B2 |
9135720 | Huang | Sep 2015 | B2 |
9473792 | Srinivasamurthy | Oct 2016 | B2 |
10257487 | Tamatam | Apr 2019 | B1 |
10748235 | Tamatam | Aug 2020 | B2 |
20120206567 | Zafarifar | Aug 2012 | A1 |
20190045193 | Socek | Feb 2019 | A1 |
Number | Date | Country |
---|---|---|
0800311 | Oct 1997 | EP |
2108177 | Apr 2019 | EP |
2007052957 | May 2007 | WO |
2019217751 | Nov 2019 | WO |
2020219341 | Oct 2020 | WO |
2022072884 | Apr 2022 | WO |
Entry |
---|
Viero, and Neuvo. “Non-moving regions preserving median filters for image sequence filtering.” IEEE 1991 International Conference on Systems Engineering. IEEE, 1991. (Year: 1991). |
Balisavira et al, Real-time Object Detection by Road Plane, Segmentation Technique for ADAS, 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems (Year: 2012). |
Akilan, Thangarajah, QM Jonathan Wu, and Yimin Yang. “Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution.” Information Sciences 430 (2018): 414-431. (Year: 2018). |
Maurer, Calvin, Rensheng Qi, and Vijay Raghavan, “A Linear Time Algorithm for Computing Exact Euclidean Distance Transforms of Binary Images in Arbitrary Dimensions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, No. 2, pp. 265-270, Feb. 2003, 6 pages. |
Wells, William, “Efficient Synthesis of Gaussian Filters by Cascaded Uniform Filters” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, No. 2, pp. 234-239, 6 pages. |
Number | Date | Country | |
---|---|---|---|
63209602 | Jun 2021 | US |