This invention concerns image processing, and in particular the segmentation into spatial regions of image data describing an image within a sequence of images.
Many image processing tasks are simplified if the image can be segmented into spatial regions where a region has some unifying property. For example a particular region may correspond with the extent of a portrayed object. Examples of processes that can benefit from image segmentation include: data compression, where similar coding decisions can be applied over a particular segment; motion compensated processing, where the same motion vector can be applied to a particular region; and, re-framing, where there is a need to ensure that a particular segment remains within the frame.
Known methods of image segmentation include the detection of edges by means of spatial filtering of the image and then bridging any gaps in the detected edges to form a segment boundary. These methods can be complex and detected edges may not correspond to the edges of image regions.
The principle of the invention is to analyse the differences in pixel values between a pair of related images, and to locate regions that are at least partially enclosed by high difference values.
The invention consists in a method and apparatus for identifying a region within a chosen image that is part of an image sequence from products of directionally-accumulated difference-image pixel values.
Suitably, pixel values are accumulated along straight lines of pixels.
In a preferred embodiment the difference-image is a DFD comprising displaced frame difference values of pixels derived from a motion estimation process applied to images in the sequence.
Advantageously, the difference-image is derived from at least an image earlier in the sequence than the said chosen image and at least an image later in the sequence than the said chosen image.
Co-located pixel values resulting from accumulation in opposing directions can be multiplied together.
In certain embodiments, co-located pixel values derived from values accumulated along intersecting lines are multiplied together.
Preferably, the magnitude of a product of accumulated pixel values for a pixel is used to identify that pixel as part of an image region.
And, the magnitude of a weighted sum of product values for a pixel is compared with a threshold and that pixel is considered to form part of an image region when the threshold is exceeded.
Advantageously, a product of pixel values is modified by a non-linear function prior to the comparison.
There are a number of difference measures that can be derived for a pair of images within an image sequence and which are suitable inputs to a segmentation process according to the invention. The simplest measure is the difference-image resulting from the subtraction of the values of pixels in one image from the values of co-located pixels in another image. Typically luminance values for the pixels are subtracted, but other pixel values, for example hue or saturation values, can be used.
Another particularly useful image difference to use is the ‘displaced frame difference’ (DFD). In motion compensated processing, pixels of a first temporal sample of a scene are shifted, that is to say displaced, by motion vectors that have been derived from a motion estimation process, so that they occupy positions they would occupy in a second, available, temporal sample. The image comprising the differences between the values of the displaced pixels of the first sample and the (unmodified) pixels of the second temporal sample is a DFD. The DFD will have large values at the boundaries of moving objects, and smaller values elsewhere. This is because pixels corresponding to freshly revealed or obscured parts of the background behind moving foreground objects will not be correctly described by the shifting of the foreground objects, even if the relevant motion vectors are accurate. The DFD is thus likely to indicate the boundaries of moving objects. However, the boundary is likely to be incomplete, and degraded by noise.
A motion estimator (3) derives motion vectors that describe the change in position of portrayed objects between frame N and frame N-1. The motion estimator (3) may use—for example—either of the well-known ‘block matching’ or ‘phase correlation’ methods, and outputs a vector for every pixel of frame N. A pixel shifter (4) shifts (displaces) every pixel of frame N according to its respective motion vector, and the resulting ‘motion compensated’ frame is stored in a displaced frame store (5). A subtractor (6) subtracts the pixel values of frame N-1 from the values of co-located, shifted pixels of frame N, and the resulting image is stored as a DFD in a DFD store (7), whose output is available to subsequent processes via terminal (8).
The above-described process forms a ‘backward’ DFD between the current frame and the preceding frame. Analogous processing can be used to form a ‘forward’ DFD between the current frame and the succeeding frame. A combination of a forward DFD with a backward DFD is a particularly suitable input to the segmentation process that will be described. The combination can be an average, or a weighted average in which the forward and backward contributions to each pixel are determined from other image data, such as confidence values for forward and backward motion vectors.
In the description which follows the input to the segmentation process will be referred to as a DFD, however other difference-images that describe differences between images at different points in an image sequence can be used. Suitable difference-images can be derived from either two or three images from an image sequence.
The input DFD is processed, as shown in
P
L(i, j)=PDFD(i, j)+PL(i−1, j) [1]
In the accumulation process pixel positions outside the image area (for example negative values of i or j) are assumed to be zero.
An example of this accumulation process will now be explained with reference to
In the process shown in
P
R(i, j)=PDFD(i, j)+PR(i+1, j) [2]
The result of applying right to left accumulation to the image of
A downward accumulation process (203) derives accumulated values according to equation [3] below, and these values are stored in a Top→Bottom store (207).
P
D(i, j)=PDFD(i, j)+PD(i, j−1) [3]
The result of applying downward accumulation to the image of
An upward accumulation process (204) derives accumulated values according to equation [4] below, and these values are stored in a Bottom→Top store (208).
P
U(i, j)=PDFD(i, j)+PU(i, j+1) [4]
The result of applying upward accumulation to the image of
The stored pixel values resulting from accumulation in opposing directions are multiplied together to identify pixels lying between non-zero-value DFD pixels. A multiplier (209) multiplies the left to right values from the L→R store (205) by the respective values from the R→L store (206), and the resulting product pixel values are stored in an ‘L.R’ store (211). This process is described by equation [5].
P
LR(i, j)=PL(i, j)×PR(i, j) [5]
The result of applying this L·R multiplication to the image of
Similarly, a multiplier (210) multiplies the top to bottom values from the Top→Bottom store (207) by the respective values from the Bottom→Top store (208), and the resulting product pixel values are stored in a ‘U·D’ store (212). This process is described by equation [6].
P
UD(i, j)=PU(i, j)×PD(i, j) [6]
The result of applying this U·D multiplication to the image of
The contents of the L·R store (211) and the U·D store (212) are combined with each other, and with: the contents of the L→R store (205); the R→L store (206); the Top→Bottom store (207); and, the bottom→top store (208). These combination processes identify five different types of image region.
Regions enclosed by non-zero-value DFD pixels on all four sides, that is to say left, right, top and bottom, are identified by multiplying (215) the values of co-located pixels from the L·R store (211) and the U·D store (212):
P
CLD(i, j)=PLR(i, j)×PUD(i, j) [7]
The result of applying this UD·LR multiplication to the image of
Regions bounded by non-zero-value DFD pixels to the left, above and below, are identified by multiplying (213) the values of co-located pixels from the U·D store (212) and the L→R store (205):
P
E(i, j)=PUD(i, j)×PL(i, j) [8]
The result of applying this UD·L multiplication to the image of
Regions bounded by non-zero-value DFD pixels to the right, above and below, are identified by multiplying (214) the values of co-located pixels from the U·D store (212) and the R→L store (206):
P
W(i, j)=PUD(i, j)×PR(i, j) [9]
The result of applying this UD·R multiplication to the image of
Regions bounded by non-zero-value DFD pixels above, to left, and right, are identified by multiplying (216) the values of co-located pixels from the L·R store (211) and the top→bottom store (207):
P
S(i, j)=PLR(i, j)×PD(i, j) [10]
The result of applying this LR·D multiplication to the image of
And finally, regions bounded by non-zero-value DFD pixels below, to left, and right, are identified by multiplying (217) the values of co-located pixels from the L·R store (211) and the bottom→top store (208):
P
N(i, j)=PLR(i, j)×PU(i, j) [11]
The result of applying this LR·U multiplication to the image of
The five DFD products PCD, PE, PW, PS, and PN all give indications that a pixel is part of a region enclosed by a non-zero-value DFD, or is part of the enclosing non-zero-value DFD. In the system of
As explained below, real DFD images rarely have zero-value DFDs; areas that ‘match’ each other after motion compensation will typically give a smaller DFD than other areas. However, the above-described processing of DFD values will identify enclosed areas by giving higher outputs in enclosed areas than in unenclosed areas.
Any of these DFD product values can therefore be used, either alone or in combination, as part of a segmentation process that identifies portrayed objects. Because they are products, the dynamic range of these five values is likely to be wide. However, the useful information in a DFD value is confined to the lower part of its dynamic range. The lowest part of the range will be occupied by noise, even if the motion vector is correct; but, when there is an occlusion or and incorrect vector, the magnitude of the DFD will depend on the image content and may have any value above the noise floor (limited only by the coding range of the video data).
It is therefore helpful to modify the dynamic range of either the DFD, or a DFD product, to reduce the effect of noise and to reduce the contribution of the highest values. This can be achieved by clipping, limiting and/or applying a non linear function. One suitable function is a sigmoid function. Another suitable option is to take the logarithm of the DFD, or DFD product, and discard all values below a low-value threshold.
The five DFD products can be combined, for example in a weighted sum. PCLD may be given a higher weight because it is evidence of four directions of limitation, as compared to the three directions of limitation represented by the other four DFD products. The weighted sum of non-linearly-processed DFD products can be compared with a threshold, and contiguous pixel positions corresponding to values above the threshold identified as belonging to an image segment.
A suitable threshold value can be found by a ‘training’ process in which the extent of regions corresponding to pixel value above a trial threshold value are compared with an observer's assessment of the image and the threshold adjusted to achieve an optimum segmentation.
In the processing described above, cumulative summations of DFD values, from one image edge to the opposite image edge, are made along lines of horizontally adjacent pixels, and lines of vertically adjacent pixels. Other straight lines of pixels can be used; lines of pixels in more than two directions can be used; and/or, the summation may be limited to a region of interest within the image. Some suitable examples will now be described with reference to
Another example of the choice of lines of pixels for processing according to the invention is shown in
The pixel values of co-located pixels in the forward difference-image (174) and the backward difference-image (173) are combined in a combination process (175). This process may take a simple average of co-located pixel values, a non-linear combination (such as the greater of the two pixel values), or a weighted average (with weights derived from a motion estimation process for example).
The output of the combination process (175) is a set of pixel values of a difference-image where high values are indicative of the edges of image regions. This is output is passed to a directional accumulation process (176) in which pixel values are directionally accumulated along straight line of pixels in opposing directions along each line.
The accumulated values of co-located pixels are combined in a combination process (177). A suitable combination method is the multiplication of pixel values as described above with reference to
The pixel values from the combination process (177) are input to a non-linear function (178), that reduces the dynamic range of the values (for example the logarithmic clipping process described above). The non-linearly processed values are compared with a threshold in a comparison process (179) and contiguous regions where the threshold is exceeded are identified as image regions in a region identification process (180).
It is possible that several regions are identified, with the different regions separated from each other by locations where the non-linearly processed pixel values are less than the threshold. It may be necessary to apply some other qualification to the detected regions; for example regions of small extent, or small area may be disregarded.
The examples shown above have analysed pixels within rectangular regions. However the invention can be applied to regions of arbitrary shape by choosing lines of pixels that include at least a proportion of the pixels within the area to be analysed.
Images may be re-sampled prior to analysis. If only low resolution segmentation is required, then a sub-sampled image comprising fewer pixels can be used for the segmentation process.
The invention is applicable to both interlaced and progressive image sequences. In the case of interlace, there are no pixels that are co-located pixels on adjacent images of the sequence. It is thus necessary to spatially re-sample one or both images of a pair of adjacent images to obtain co-located pixels.
Although processing of DFD images from a temporal sequence of images has been described, the invention is applicable to other image sequences, such as a sequence changing viewpoints. Difference-images representing the respective value differences between co-located pixels in images at different positions in the sequence can be used. And, provided vectors can be derived that describe the displacement of objects between different images of the sequence, a DFD can be formed from two images in the sequence and this DFD can be processed according to the invention to identify one or more image regions.
Typically luminance values are used to form the DFD or difference-image, however other values, including colour separation values (RGB, CMY, etc.) or colour difference values (U, V, CB, CR, etc.) can be used.
Number | Date | Country | Kind |
---|---|---|---|
1309489.1 | May 2013 | GB | national |