The invention concerns a method and a device for generating, from a first sequence of images, called source sequence, a second sequence of images of reduced size, called reduced sequence, based on information characterizing the perceptual relevancy of pixels or groups of pixels within the images of the source sequence.
The invention relates to the generation of a reduced sequence from a source sequence. Indeed, when browsing among a large amount of image sequences, it may be interesting to select a sequence at a glance. To this aim the generation of reduced sequences may be of interest. A reduced sequence is a sequence that has a smaller size than the source sequence from which it is generated. For example, in a context of sequence browsing, the selection of a sequence of images among various sequences may be eased and accelerated by displaying several reduced sequences on a given reproduction device (e.g. displaying a mosaic of reduced sequences on a TV set display). A classical method for generating a reduced sequence consists in down sampling the images of the source sequence. In this case, if the reduction of the size is too strong, the most important parts of the reduced sequence become very small which degrades the user viewing experience.
The invention aims at improving the user viewing experience by generating a reduced sequence by taking into account perceptual relevancy of pixels or groups of pixels, called regions hereinafter, within the images of the source sequence from which the reduced sequence is generated.
The invention relates to a method for generating from a sequence of at least one source image, called source sequence, a sequence of at least one reduced image, called reduced sequence, the at least one reduced image having a size smaller or equal to the size of the at least one source image, the at least one reduced image being generated by extracting from the at least one source image at least one image part delimited in the source image by an extraction window. The extraction window is defined by the following steps of:
According to a first embodiment, the step for defining the initial extraction window comprises the following steps of:
Preferentially, if after the step for displacing the initial extraction window, the extraction window is not fully included into the source image, the extraction window is translated to be fully included into the source image.
According to a specific characteristic, the perceptual relevancy gravity center coordinates iGC and jGC are computed as follows:
where:
(ip,jp) are the coordinates of a pixel p of the extraction window WE;
s(ip,jp) is a perceptual relevancy value associated to the pixel p.
Advantageously, the step c) for defining the initial extraction window consists in the following four successive steps:
According to a specific embodiment, the sequence comprising at least two images, the method is applied successively on the images of the sequence. Preferentially, if the distance between the most conspicuous pixels of a first image and of a second image following the first image is below a predefined threshold, the preliminary window in the second image is centered on the pixel located at the same position as the most conspicuous pixel in the first image.
According to a second embodiment, the step for defining the initial extraction window comprises the following steps of:
Preferentially, a perceptual interest value is a saliency value.
The invention also relates to a method for generating a combined reduced sequence of several images. To this aim, a first reduced sequence of images is generated from a source sequence and a second reduced sequence of images is generated by downsampling the source sequence, the combined reduced sequence being generated from the first reduced sequence of images by replacing a predefined number of consecutive images of the first reduced sequence by corresponding images of the second reduced sequence.
Other features and advantages of the invention will appear with the following description of some of its embodiments, this description being made in connection with the drawings in which:
The invention relates to a method for generating a reduced sequence of images from a source sequence of images based on side information relating to the perceptual relevancy of pixels within the images of the source sequence. It may also be used for generating a single reduced image from a single source image. The side information may be provided by external means for example within data files. It may also be provided by the following method consisting in determining a perceptual relevancy value for each pixel in each image of the source sequence. The perceptual relevancy value associated to each pixel may be a saliency value. In this case, a saliency map is associated to each source image. A saliency map s is a two dimensional topographic representation of conspicuity of the image. This map is normalized for example between 0 and 255. The saliency map is thus providing a saliency value s(i,j) per pixel (where (i,j) denotes the pixel coordinates) that characterizes its perceptual relevancy. The higher the s(i,j) value is, the more relevant the pixel of coordinates (i,j) is. A saliency map for a given image may be obtained by the method described in EP 1 544 792 application and in the article by O. Le Meur et al. entitled “From low level perception to high level perception, a coherent approach for visual attention modeling” and published in the proceedings of SPIE Human Vision and Electronic Imaging IX (HVEI'04), San Jose, Calif., (B. Rogowitz, T. N. Pappas Ed.), January 2004. The article by O. Le Meur et al. entitled “Performance assessment of a visual attention system entirely based on a human vision modeling” and published in the proceedings of ICIP in October 2004 explained also the saliency model. This method comprising the following steps:
projection of the image according to the luminance component if the image is a single color image and according to the luminance component and to the chrominance components if the image is a color image;
perceptual sub-bands decomposition of the projected components in the frequency domain according to a visibility threshold of a human eye; the sub-bands are obtained by carving up the frequency domain both in spatial radial frequency and orientation; each resulting sub-band may be regarded as the neural image corresponding to a population of visual cells tuned to a range of spatial frequency and a particular orientation;
extraction of salient elements of the sub-bands related to the luminance component and related to the chrominance components, i.e. the most important information from the sub-band;
contour enhancement of the salient elements in each sub-band related to the luminance and chrominance components;
calculation of a saliency map for the luminance component from the enhanced contours of the salient elements of each sub-band related to the luminance component;
calculation of a saliency map for each chrominance components from the enhanced contours of the salient elements of each sub-band related to the chrominance components;
creation of the saliency map as a function of the saliency maps obtained for each sub-band.
For sake of clarity the steps of the method are described for a single source image but may be applied in parallel or successively on each image of a source sequence in order to generate a reduced sequence.
The following notations are used hereinafter:
According to a first embodiment, the method comprises 4 steps referenced 10, 11, 12 and 13 on
If
the preliminary window 20 is fully included in the source image 2 and it has not to be translated. The coordinates (corx, cory) of the top left corner of the window are computed as follows:
If the preliminary window 20 is not fully included in the source image 2, it has to be translated by a vector
whose coordinates depend on the position of the pixel 21 of coordinates (imax,jmax). In this case, the coordinates of the top left corner of the translated preliminary window 20 are computed as follows:
Depending on the values of imax and jmax several cases occur. If the preliminary window 20 is out of the limits of the source image 2 in one direction, either horizontal or vertical direction, then:
If the preliminary window 20 is out of the limits of the source image 2 in both horizontal and vertical directions then:
The step 11 consists in first computing a saliency value associated to the preliminary window 20, possibly translated, denoted by SMreduced. In order to estimate the perceptual interest of the preliminary window 20, SMreduced is compared to the saliency value SMimage associated to the source image 2
by computing the ratio
According to a preferred embodiment, if the ratio Ψ is close to 0, i.e. if Ψ is lower than a second predefined threshold referenced T2 on
The step 12 consists in dynamically adapting the size of the preliminary window 20 to the image content by an iterative approach in order to generate an initial extraction window referenced 22 on
where:
δx and δy are arbitrarily set values; and
k is an integer number which represents the number of iterations.
The size of the current window is increased until k>itermax or until ΔΨk≦ε, where ε is a third predefined threshold which is arbitrarily set (e.g. ε=10−3). The size of the initial extraction window 22 equals the size computed at the last iteration, i.e. the initial extraction window 22 is defined by identifying the lastly defined current window with the initial extraction window 22. Preferentially, δx and δy are multiple of 2 in order to ease the determination of the up/down sampling filters when the reduced sequence has to be displayed on a reproduction device whose size is a multiple of 2.
The step 13 consists in generating the reduced image by extracting the image part of the source image delimited by the initial extraction window 22. The first pixel of the reduced image corresponds to the top left pixel of the initial extraction window 22.
According to a preferred embodiment, the step 12 is followed by a step 12′ which consists in displacing the initial extraction window 22, also called WE in the sequel, to a new position in order to generate a displaced extraction window 23. This is achieved by computing the gravity center GC of the initial extraction window 22. The coefficients associated to the pixels in order to compute the gravity center are the perceptual interest values of the pixels (e.g. saliency values). Therefore the coordinates (iGC, jGC) of the gravity center GC are computed as follows:
where:
In the case of a source sequence of at least two images, the step 10 is advantageously modified such that the center of the window of size (redsx0,redsy0) in the current source image 2 is positioned on a pixel of coordinates Imaxcur and Jmaxcur which may be different from the most conspicuous pixel 21. Imaxcur and Jmaxcur are calculated in order to ensure a temporal coherency of the sequence by avoiding that the spatial location of the preliminary window 20 in the source image drastically changes from one source image to the next source image. To this aim, the displacement d between two most conspicuous pixels (i.e. pixels of highest saliency values) in two consecutive source images is computed by the following formula:
d=√{square root over ((imaxcur−imaxprev)2+(jmaxcur−jmaxprev)2)}
where:
(imaxcur,jmaxcur) are the coordinates of the most conspicuous pixel in the current source image; and
(imaxprev,jmaxprev) are the coordinates of the most conspicuous pixel in the previous source image preceding the current source image.
In order to avoid wavering, for a small displacement d between the two most conspicuous pixels, the center of the preliminary window 20 is positioned in the current source image on the pixel whose coordinates equals the coordinates of the most conspicuous pixel in the previous source image. Thus if d≦ThStill, the values of Imaxcur and Jmaxcur are set equal to imaxprev and imaxprev. A value of 7 for ThStill seems to be adapted. ThStill may also depend on the size of the window.
For a large displacement, i.e. if d>ThMove, the temporal coherency is threatened. The value ThMove depends on the size of the preliminary window 20. For example, the value is equal to the diagonal of the preliminary window 20, e.g. ThMove=√{square root over ((redsx0)2+(redsy0)2)}. Therefore, the center of the preliminary window 20 is positioned in the current source image on the pixel of coordinates Imaxcur and jmaxcur computed as follows:
According to a preferred embodiment, more than one window is used to generate the reduced image. In this case, the steps 10, 11, 12 and 13 are replaced by the following steps. In referenced to
is close to 1, i.e. higher than a fifth predefined threshold (e.g. if Ψ0≧0.8), then the reduced image is generated by extracting the image part of the source image 2 which is delimited by the first window 70. If Ψ is not close to 1, i.e. lower than the fifth predefined threshold, a second window 71 is positioned in the source image 2 so that its center is located on the second most conspicuous pixel of the source image 2, i.e. the most conspicuous pixel of the source image 2 whose saliency value is just lower than the saliency value of the first most conspicuous pixel. The saliency value SMreduced
is close to 1 then the image part extracted from the source image 2 to generate the reduced image corresponds to the smaller window 7 that comprises the first and the second windows 70 and 71. If the ratio Ψ1 is not close to 1 then a third window is positioned in the source image 2 so that its center is located on the third most conspicuous pixel of the source image 2, i.e. the most conspicuous pixel of the source image 2 whose saliency value is just lower than the saliency value of the second most conspicuous pixel. The ratio between the sum of the three saliency values associated to the three windows and the saliency value SMimage is compared to 1. If it is close to 1, then the image part extracted from the source image 2 to generate the reduced image corresponds to the smaller window that comprises the three windows. If it is not close to 1 then the process of positioning new windows is repeated until the ratio between the sum of saliency values associated to each positioned window and SMimage is close to 1. Thus, at each iteration k, a new window is positioned in the source image 2 so that its center is located on the most conspicuous pixel (k) whose saliency value is just lower than the saliency value of the previous most conspicuous pixel (k−1).
The reduced image is then generated by extracting the image part from the source image 2 delimited by the smaller window that comprises all the positioned windows. Preferentially before computing their saliency values, the windows are translated to be fully included into the source image and/or displaced according to step 12′.
According to another embodiment, steps 10, 11, and 12 are replaced by the following step which consists in positioning the initial extraction window 22 by identifying in the image source 2 the pixels whose saliency values are higher than a fourth predefined threshold. The initial extraction window 22 is thus defined by its top left corner and its bottom right corner. Among the identified pixels whose abscissa is the lowest, the pixel with the lowest ordinate is set as the top left corner of the initial extraction window 22. Among the identified pixels whose abscissa is the highest, the pixel with the highest ordinate is the bottom right corner of the initial extraction window 22.
The step 13 is then applied in order to generate the reduced image by extracting the image part of the source image delimited by the initial extraction window 22. The first pixel of the reduced image corresponds to the top left pixel of the initial extraction window 22.
According to a specific embodiment, in the case of a source sequence of more than one image, the step 13 is advantageously followed by the following step which consists in generating a combined reduced sequence by switching between the reduced sequence according to the invention, called first reduced sequence, and the source sequence reduced according to a classical method, called second reduced sequence. A classical method refers to a method that simply downsamples the images of the source sequence to generate the reduced sequence. This combined reduced sequence should improve the viewer understanding of the scene by providing him with the global scene (e.g. downsampled version of the source images). The switching step is driven by the value of Ψ. To this aim the ratio Ψ is computed for the current image. If the value Ψ is not close to 1 the classically reduced video (i.e. the down sampled source images) is transmitted during Δ seconds as depicted on
According to a particular embodiment, several regions of interest may be handled by dividing the reduced images into several (e.g. four sub-images) parts, each part containing either the source sequence or one of the most conspicuous regions (i.e. higher saliency values).
The parts corresponding to the most conspicuous regions are determined by the four following successive steps:
Obviously, the invention is not limited to the embodiments described above. The skilled person may combine all these embodiments. Information other than saliency values or saliency maps can be used as side information provided it characterizes the perceptual relevancy of pixels or groups of pixels.
Some of the advantages to build reduced sequence centered on the most important regions (i.e. in other words the advantage to drive the reduction by perceptual interest of regions within the images) are listed below:
The different embodiments may offer some of the advantages among those listed below:
Number | Date | Country | Kind |
---|---|---|---|
05291615.2 | Jul 2005 | EP | regional |
05291938.8 | Sep 2005 | EP | regional |