The invention relates to a method and an apparatus for tracking superpixels between related images. More specifically, a method and an apparatus for micro tracking of superpixels in temporally or spatially related images are described, which make use of asymmetric pixel mapping.
Over the last decade superpixel algorithms have become a broadly accepted and applied method for image segmentation, providing a reduction in complexity for subsequent processing tasks. Superpixel segmentation provides the advantage of switching from a rigid structure of the pixel grid of an image to a semantic description defining objects in the image, which explains its popularity in image processing and computer vision algorithms.
Research on superpixel algorithms began with a processing intensive feature grouping method proposed in [1]. Subsequently, more efficient solutions for superpixel generation were proposed, such as the simple linear iterative clustering (SLIC) method introduced in [2]. While earlier solutions focused on still images, later developments aimed at applications of superpixels for video, which require their temporal consistency. In [3] an approach achieving this demand is described, which provides traceable superpixels within video sequences.
The temporal dimension in image processing and computer vision algorithms requires the tracking of objects in video by tracking superpixels over time. This is an easy task when a macro tracking of superpixels is needed, requiring a simple superpixel-to-superpixel mapping. More often, however, image processing requires a micro tracking, describing the pixel-to-pixel correspondence between temporally adjacent superpixels, i.e. between macro tracked superpixels.
The difficulty of micro tracking arises from the shape of the superpixels deforming over time. While this does not harm the macro tracking, it definitely eliminates the possibility for a straightforward pixel-to-pixel assignment between temporally corresponding superpixels. The changes in the superpixel shapes cause at least a relative pixel shift. Most often the superpixel shape deformation also changes the superpixel size, resulting in a superpixel pixel count difference between temporally adjacent superpixels. This requires an asymmetric pixel mapping instead of a one-to-one pixel mapping for the superpixel micro tracking.
The quality of superpixel micro tracking can be measured by their isogonic projection and the coverage reached for an asymmetric pixel mapping. The isogonic projection describes the relative pixel order given by the mapping, and the coverage refers to the percentage of pixels pointed at for the asymmetric mapping. For the case of a bad coverage the asymmetric mapping excludes large parts of the target superpixel by linking multiple and more than necessary pixels of the origin superpixel to a single pixel located within the target superpixel. This leads to unnecessary holes in the map, which exclude pixel locations of the target superpixel.
Apart from the temporal superpixel assignment aspect in image processing, similar problems arise for a multi-view superpixel assignment, which is required in light field camera and other multi-view applications. Temporal superpixels and multi-view superpixels are interchangeable items. Therefore, the temporal aspects of object related pixel assignments can be transferred to multi-view aspects of them. Those multi-view aspects are extensively used in image processing applied for light field cameras, for example.
It is an object of the present invention to propose an improved solution for micro tracking of superpixels in temporally or spatially related images.
According to the invention, a method for pixel mapping between an origin superpixel in a first image and a target superpixel in a second image comprises:
Accordingly, a computer readable storage medium has stored therein instructions enabling pixel mapping between an origin superpixel in a first image and a target superpixel in a second image, which, when executed by a computer, cause the computer to:
Also, in one embodiment an apparatus configured to perform pixel mapping between an origin superpixel in a first image and a target superpixel in a second image comprises:
In another embodiment, an apparatus configured to perform pixel mapping between an origin superpixel in a first image and a target superpixel in a second image comprises a processing device and a memory device having stored therein instructions, which, when executed by the processing device, cause the apparatus to:
The proposed solution uses a geometrical asymmetric pixel mapping, which is based, for example, on relating the mass centers of the pair of mapped superpixels in the first image and the second image to each other. Depending on the application of the approach in the temporal domain or in the spatial domain, the first image and the second image are successive images of a sequence of images, multi-view images of a scene, or even sequences of multi-view images of a scene.
In one embodiment, a feature vector of a pixel comprises a relative distance of the pixel to a mass center of the superpixel and a topology value of the pixel. For each pixel location in the origin superpixel the following parameters are determined: the angle relative to its mass center, its relative distance to the mass center, and its topology value. The topology value of the pixel indicates a minimal distance of the pixel to a nearest superpixel border. Topology value and relative distance of the origin superpixel form a feature vector, whose best representative is searched for within the target superpixel.
In one embodiment, the target pixels are determined by:
The search for the best match of a similarly formed feature vector within the target superpixel begins from the target superpixel mass center and follows the ray having the same angle as determined in the origin superpixel. A matching quality measure is determined using the Euclidian vector distance. Preferably, the first minimum found for the feature vector distance while following the ray within the target superpixel is taken as the optimal mapping position.
In one embodiment, for determining the corresponding ray for the target superpixel a scaling of the target superpixel relative to the origin superpixel is taken into account. Preferably, the superpixel is divided into four quadrants and a horizontal and a vertical scaling is determined individually for each of the quadrants. An advanced technique applies an additional scaling along the search rays, which is called Quadrant Scaled Superpixel Mapping, to compensate for nonlinear distortions and size changes that are present in practice for superpixel pairs. The Quadrant Scaled Superpixel Mapping approximates nonlinear superpixel shape distortion, which is superior to the otherwise used Uniform Scaled Superpixel Mapping by improving the overall coverage of the asymmetric mapping algorithm.
The proposed solution leads to an isogonic pixel mapping between two related superpixels, i.e. temporally or spatially adjacent superpixels, with an optimal coverage. This ensures an optimal micro tracking, which is a prerequisite for efficient image processing in the temporal domain as well as for computer vision algorithms such as multiclass object segmentation, depth estimation, segmentation, body model estimation, and object localization. The proposed approach provides micro maps of high quality, handles arbitrary superpixel shapes, including concave shapes, and is robust by tolerating arbitrary superpixel distortions over time.
For a better understanding the invention shall now be explained in more detail in the following description with reference to the figures. It is understood that the invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention as defined in the appended claims.
Superpixels represent an over-segmentation of image data which is useful for object detection and allows a reduction in complexity for subsequent processing tasks.
In the following the invention is explained with a focus on superpixels in temporally adjacent images, e.g. images of a video sequence. However, the described approach is likewise applicable to spatially related images, e.g. multi-view images and sequences of multi-view images.
The technique of temporally consistent superpixels extends the over-segmentation of single images to videos, allowing object tracking over time.
The superpixel segmentation—represented by a superpixel label map—can be utilized to generate a superpixel topology map. The topology map expresses for each pixel position the minimal distance to the nearest superpixel border.
As described above, temporal image processing and computer vision algorithms require a micro tracking describing a pixel-to-pixel correspondence between temporally adjacent superpixels. The difficulty of a correct micro tracking is depicted in
Keeping the constant grid position would result in an inaccurate mapping and would introduce additional distortions on top of those already present.
The approach disclosed here provides a method to generate correct pixel-to-pixel correspondences between temporally adjacent superpixels or multi-view superpixels and copes with translational superpixel motions as well as superpixel shape distortions. First a simple method called Uniform Scaled Superpixel Mapping (USM) is described, followed by an advanced method called Quadrant Scaled Superpixel Mapping (QSM). The detailed description starts with explaining the steps needed for the USM method. Subsequently follows the description of the additional steps which are required for utilizing the QSM method.
The Uniform Scaled Superpixel Mapping method works as follows. For each pixel pN(i) located in the origin superpixel SPN in image tn a corresponding pixel pM(j) is required, being located within the temporally consistent superpixel SPM in image tm:
In general the number of pixels I contained in SPN and the number of pixels J contained in SPM are different. Therefore, the resulting pixel mappings can be one-to-many, one-to-one, many-to-one, and a combination of them.
The following steps describe how to determine for each pixel PN(i) located within the origin superpixel SPN a single corresponding pixel pM(j) located within the target superpixel SPM. This can be done separately for all superpixels within an image:
STEP 1: Get the angle α defined by the polar coordinates for the origin pixel pN(i) by taking the mass center of SPN as the origin of the coordinate system.
STEP 2: Determine the topology value TPN(i) of the origin pixel PN(i).
STEP 3: Determine the relative distance of the origin pixel pN(i)=[xN(i),yN(i)] from the mass center MCN=[XN,YN], which is calculated as
STEP 4: Examine those pixels within the target superpixel SPM intersected by the ray with the same angle α beginning from the mass center MCM=[XM,YM], where the angle α is calculated as
STEP 5: Along the ray j(α), determine the topology value TPM(j) of pixel pM(j).
STEP 6: Along the ray j(α), determine the relative distance of the pixels pM(j)=[xM(j),yM(j)] from the mass center MCM=[XM, YM], which is calculated as
STEP 7: Along the ray j(α), determine the feature vector distance ΔV(i,j):
ΔV(i,f)=√{square root over ((DN(i)−DM(j))2+(TPN(i)−TPM(j))2)}{square root over ((DN(i)−DM(j))2+(TPN(i)−TPM(j))2)}{square root over ((DN(i)−DM(j))2+(TPN(i)−TPM(j))2)}{square root over ((DN(i)−DM(j))2+(TPN(i)−TPM(j))2)}.
STEP 8: Find the minimum of all vector distances along the ray j(α) and take its coordinates as mapping location:
FIG. 8 depicts a simplified version of the superpixel mapping method, where the left part depicts the treatment of data within the origin superpixel, as described in steps 1 to 3, and the right part depicts the remaining steps 4 to 8 to find the appropriate match within the target superpixel. In this figure the black crosses mark the mass center positions, whereas the black circles and the dark grey circles designate the origin pixel position and the micro match pixel position, respectively.
A calculated mapping example is given in
The Quadrant Scaled Superpixel Mapping method differs from the USM only in STEP 3, STEP 4, and STEP 6 by using a more sophisticated weight calculation. The principle of the operations described for determining the relative distances DN and DM remain the same, but the weight denominators are substituted by more adaptive versions.
The idea behind the QSM specific scaling is depicted in
For the QSM method the weights w are substituted by quadrant related weights qφ, where φε{1, 2, 3, 4} indicates the quadrant the calculation is executed for.
Thus, the QSM substitutes the weights in STEP 3 by
and, accordingly, in STEP 6 by
q
x,M
φ=maxjεJxM(j,φ)−minjεJxM(j,φ)
q
y,M
φ=maxjεJyM(j,φ)−minjεJyM(j,φ).
Further, the distance measures in STEP 3 and STEP 6 change to
respectively. Finally, the angle α for QSM in STEP 4 changes to
A method according to the invention for pixel mapping between an origin superpixel in a first image and a target superpixel in a second image is schematically illustrated in
Another embodiment of an apparatus 30 configured to perform the method according to the invention is schematically illustrated in
For example, the processing device 31 can be a processor adapted to perform the steps according to one of the described methods. In an embodiment said adaptation comprises that the processor is configured, i.e. for example programmed, to perform steps according to one of the described methods.
Number | Date | Country | Kind |
---|---|---|---|
14306126.5 | Jul 2014 | EP | regional |
14306706.4 | Oct 2014 | EP | regional |