METHOD AND DEVICE FOR CODING IMAGES REPRESENTING VIEWS OF THE SAME SCENE

Description

Other particularities and advantages of the invention will also emerge from the following description, illustrated by the accompanying drawings, in which:

FIG. 1 depicts schematically a method of using a multi-view coder;

FIG. 2 depicts schematically a predictive coding structure for coding multiple views in the context of a multi-view coding;

FIG. 3 depicts a block diagram of a multi-view coding method implementing the invention;

FIG. 4 depicts an algorithm for positioning the views in the predictive coding structure according to the invention;

FIG. 5 depicts schematically the area of intersection of shots from two adjacent cameras;

FIGS. 6
a and 6b depict tables containing sequence intercorrelation values used for an example embodiment of the invention;

FIGS. 7
a and 7b depict two examples of association of the sequences with views according to two coding methods according to the invention; and

FIG. 8 depicts schematically a device adapted to implement the invention.

In describing the invention reference will be made in particular to the MVC coding format, for which a standard is currently being drawn up. Nevertheless it should be understood that the application of the invention is not limited to the MVC format, the invention being able to apply to any coding format using a plurality of views of a scene taken by a plurality of cameras or generated synthetically and stored in a memory space, and in particular for coding video sequences in such a context. The invention could nevertheless also apply for coding a plurality of fixed images.

FIG. 1 illustrates a schematic example of a context of taking images of the multi-view type in which a multi-view coding finds its application. This figure shows five cameras C1 to C5 that are positioned in an arc of a circle around a scene, each supplying a different view of the scene S, represented by a gray-tinted circle in the figure. Each camera picks up a video stream that is sent to a coder 10, for example an MVC coder, which encodes all the video sequences of the views (five in number in this example) in a single bitstream not shown in the figure. The objective of such a coder is to take advantage of the correlations between the views in order to increase to the maximum possible extent the compression ratio of the resulting bitstream.

There can be any number of cameras and they can have any configuration. For example, cameras can be envisaged disposed at the four corners of a rectangle aimed at the center, or cameras disposed in several superimposed stages in front of a scene. The number of cameras is not limited and can be much greater than five. However, the greater this number, the greater the calculating power necessary for generating the final bitstream.

FIG. 2 describes schematically a predictive coding structure used in MVC for the example of photographing illustrated previously in FIG. 1. In the example in FIG. 2, the views that the coder receives as an input are denoted V1 to V5. Without any prior preprocessing, these views can correspond respectively to the cameras C1 to C5. Alternatively, the sequences taken by the various cameras could have been stored in memory prior to the coding.

The MVC coder for which the standard is currently being drawn up is based on the H.264 coding techniques for the compression of views and uses spatio-temporal prediction both for the coding of the images of a given sequence coming from the same view (intra-view coding) and for the coding of the sequences corresponding to the views (inter-view coding). In an H.264 coder there exist principally three types of image:

so-called “intra” images, denoted I, are divided into macro-blocks that are coded independently, without making reference to other images.

so-called “predicted” images denoted P can use images from the past, the macro-blocks being predicted by means of motion vectors from macro-blocks of images encoded previously referred to as reference images. The macro-blocks are then coded either by temporal prediction (P), or in intra (I), in order to optimize the rate-distortion compromise.

so-called “bi-predicted” images denoted B can use past and future images for predicting macro-blocks. The macro-blocks are then coded either by bi-predicted temporal prediction (B), or by mono-predicted temporal prediction (P), or in intra (I), the choice being made so as to improve the rate-distortion compromise. The H.264 standard also makes it possible to predict a hierarchical prediction by creating several bi-predicted image levels.

As illustrated in FIG. 2, the images of a sequence coming from a given view, for example V1, are coded dependently according to the H.264 format: the first image of the first group of images of the sequence is coded in intra (I), the image in the 9^thposition is a P image and the bi-predicted images are interposed between the I and P images. The images B1, B2 and B3 correspond to the H.264 hierarchical bi-predicted images: the images B1 are predicted from the P and I images, the images B2 are predicted from the P and I images B1 and the images B3 are predicted from the P and I images B2. This first view is processed in an identical manner to a sequence of images compressed using the H.264 standard.

In addition, in MVC coding, the sequences corresponding to the different views are also coded predictively, in a structure called a multi-view group (denoted GOMV, the acronym for “Group of Multi-Views” in English). In the example in FIG. 2, the group of multi-views contains the 16 images of the 5 different views. Thus a group of multi-view images can be coded independently, like a group of images (“group of pictures”, denoted GOP) in the H.264 format.

With reference to FIG. 2, the arrows between the images of the various views illustrate the inter-view coding prediction structure according to the MVC format. For example, the first image of the view V3 is of the P type compared with the first image of V1, I. It can therefore comprise macro-blocks predicted by means of motion vectors from the I image of V1. In the same way, the first image of the group of images of the view V5 uses the first image of the view V3. For the views V2 and V4, the first images are bi-predicted from the first images of the other views.

With regard to the second view V2, this contains images denoted B3 that are bi-predicted from the images B2 of the views V1 and V3 with the same temporal index, and images denoted B4 that are predicted from four reference images coming from the view itself and the adjoining views, V1 and V3.

It appears clearly following the description of this predictive coding structure that there exists a certain hierarchy between the input views of the structure, and that the choice of the positioning of the sequences in the structure has major consequences in terms of efficacy of global compression of the coding. This is because the view placed first, V1, is the principal view, on which the inter-view predictions are based. In addition, it should also be noted that this view is more easily accessible at the time of decoding, since it is not coded in a dependent manner with respect to the other sequences, and will therefore be decoded first. Among the “secondary” views, there is also a hierarchy, since the views, the first image of which is coded in mono-predicted mode P (views V3 and V5 in the example), also serve as a basis of prediction for the other sequences. In this example, it is possible to classify the views in order of dependence in the following manner: V1, V3, V5 and [V2 V4]. This is because views V2 and V4 are both coded with respect to the previously coded views V1, V3 and V5 not serving as a basis for prediction in this example, and therefore their order of coding can be chosen arbitrarily.

Thus the positioning of the sequences in the multi-view predictive coding structure is very important since this determines both the efficacy of coding and the ease of subsequent access to the decoded sequences. The present invention proposes a solution to this problem, which has the advantage of having limited calculation complexity.

FIG. 3 illustrates the principal steps of a multi-view coding method implementing the present invention.

The first step E31 consists of the acquisition of the multi-view sequences Si corresponding to the various initial views of a scene, each view being able to be associated with a photographing camera. These sequences can be acquired directly coming from the cameras. Alternatively, the sequences can have been previously stored, for example on a hard disk, and step E31 consists in this case of obtaining them from the storage memory.

Step E31 is followed by a step of analyzing the content of the various sequences E32. This analysis can use in particular parameters of the photographic cameras corresponding to the sequences Si, and in particular the position of these cameras in space. The position of the cameras 35 is useful for determining the positioning of the sequences in the predictive coding structure, and in particular for determining the main view and successive secondary views. The preferences of the user 36 can also be taken into account optionally. An implementation of this step according to the invention will be detailed below in the description of FIG. 4.

The analysis step is followed by a step E33 of associating the initial views and their sequences Si associated with the input views of the predictive coding structure, according to the results of the analysis step.

For example, if five cameras are available in the example in FIG. 1, numbered C1 to C5, each taking a corresponding sequence S1 to S5, this step consists of associating each sequence with a view among the views of the predictive coding structure V1 to V5, which will then be coded according to the coding dependency structure illustrated in FIG. 2. Thus it would be possible to associate for example the sequence S3 with the principal view V1, S2 with V3 and S4 with V5 and finally S1 with V2 and S5 with V4.

Step E33 is followed by step E34 of multi-view coding of the sequences of images, a step that covers, according to the embodiment described here, the coding steps of the type H.264 known to persons skilled in the art, which will not be detailed here.

With reference to FIG. 4, a description will now be given in detail of an algorithm for determining the associations of the initial views with the input views of the predictive coding structure according to the invention, which details in particular the analysis step E32 of FIG. 3.

According to the embodiment described here, the first step of the algorithm is step E41 of reading the user preferences, which makes it possible to define a coding mode favored by the user, among several modes envisaged.

The first mode consists of favoring a particular view, which will then be considered to be the principal view and associated with the view V1 at the input of the coding structure. In the preferred embodiment, this view is chosen by a user. According to an alternative embodiment, in the case of a plurality of client entities that must receive the coded multi-view stream, the principal view can be chosen, for example, as the view requested by the largest number of clients. The arrangement of the sequences issuing from the other cameras in the decoding structure will be carried out according to a rate-distortion optimization criterion with a view to optimization of the coding, as described below.

A second coding mode envisaged consists of seeking a global rate-distortion optimization of the coding of all the sequences, that is to say selecting both the sequence associated with the principal view and all the sequences associated with the secondary views according to a coding optimization criterion.

The following step E42 consists of testing the availability of the camera parameters. If these parameters are available, step E42 is followed by step E43.

During step E43 of determining the volumes photographed with each camera, the following parameters are taken into account in this embodiment.

the position of the camera in three-dimensional space

the axis of sight of the camera V

the depth of field PF

the angle of the lens α

FIG. 5 shows a plan view of two cameras C1 and C2. The respective photographing volumes of these cameras are calculated according to the aforementioned parameters, and the depth of field is taken into account in order to define the sharpness zone. The projections in the plane (O, Ox, Oy) of their respective photographic volumes are parallelepipeds shown in gray tint in the figure.

It should be noted that, for two given cameras, it is possible to determine an overlap zone Z₁₂which is the intersection zone between the two previously determined volumes, and which therefore corresponds to a part of the scene that is captured by the two cameras.

Step E43 is a step that can be carried out prior to the coding of the sequences captured by the cameras in question if the position of these cameras is fixed.

Returning to FIG. 4, in the case where the parameters of the cameras are not available, the test step E42 is followed by the adjustment step E47, which consists of adjusting the images of various views in order to determine whether they include common parts. In the preferred embodiment of the invention, this step is performed for all the pairs of sequences available. Image adjustment techniques known to persons skilled in the art can be used for this purpose. Conventionally, two fixed images taken by two distinct cameras are considered and an adjustment algorithm is applied making it possible to determine an intersection zone between these two images. It should be noted that this step can be carried out on fixed images and therefore for example on the first image of each sequence in question. Assuming that the position of the photographing cameras is fixed, this step does not have to be performed for the rest of the sequences, its result being valid since it is a case indirectly of determining an intersection zone between the photographing volumes of two cameras. In the case where the cameras move, it suffices to once again perform the adjustment step E47 in order to determine the intersection zone.

According to circumstances, step E43 or step E47 is followed by step E44, which consists of determining common parts between the views. Thus the calculation spaces are limited to these common parts determined.

As explained previously with reference to FIG. 5, for each pair of cameras in question, it is possible to determine an intersection zone containing a common part between the two views taken by the respective cameras. It is thus possible to determine an intersection zone by pairs of cameras for at least one sub-part of the cameras.

In the case where the determination step E44 follows the adjustment step E47, it consists, for each pair of views considered, of determining a spatial intersection zone between the views from the result of the adjustment.

In the preferred embodiment of the invention, all the pairs of views are considered. Thus the estimation of similarity between sequences of images or parts of sequences of images will be made solely from the image signal contained in the previously determined intersection zones. This has the effect of considerably reducing the calculation complexity of any resemblance determination method used subsequently.

Step E44 is followed by step E45 of calculating the similarity between the sequences. In the preferred embodiment, it is a case of the calculation of the intercorrelation between sequences of images, estimated solely on the previously determined intersection zones.

The intercorrelation between sequences is preferably calculated from one of a group of GOMV images in the sequence. According to the preferred embodiment, the surface area of the intersection zone is taken into account. Thus the final intercorrelation IF_i,jbetween the sequence S_iand the sequence S_jis obtained by weighting the initial intercorrelation II_i,jcalculated on the signal contained within the intersection zone by the ratio between the common surface area SC between the views corresponding to the sequences and the total surface area of a view ST:

IF
_i,j
=II
_i,j
×SC/ST (eq 1)

In order to reduce further the number of calculations, it is possible to calculate the intercorrelation on a subset of images in the group of GOMV images in the sequence, possibly on a single image of the GOVM. Likewise, it is possible to take into account in the calculation only a subset of the pixels contained in an intersection zone.

FIG. 6
a illustrates a numerical example of an intercorrelation matrix between sequences of images corresponding to five views. This matrix is symmetrical and comprises a diagonal composed of 1. Sorting the values of the correlation matrix makes it possible to associate the sequences with views at the input of the coding structure.

The table in FIG. 6b comprises the sum of the intercorrelation values for each sequence, obtained by adding the values of each line of the matrix in FIG. 6a.

According to alternative embodiments, it is possible to use other methods for estimating the resemblance between sequences. For example, it is possible to replace the calculation of the intercorrelation with a calculation of estimation of motion between the sequences, usually referred to as calculation of disparities in the context of MVC coding.

Returning to FIG. 4, step E45 is followed by step E46 of associating the sequences corresponding to the initial views of the scene with the views at the input of the MVC predictive coding structure, as a function of the mode chosen according to the user preferences at step E41.

According to the first mode envisaged, the initial view corresponding to the principal view is defined by external constraints, such as for example the choice of the user or of the majority of client entities, and consequently the associated sequence is chosen as the principal view. By way of example, let us assume that the view corresponding to the sequence S4 is selected as the principal view. It is then necessary to allocate the other sequences to the various input views of the predictive coding structure. To this end, it is possible to use the intercorrelation matrix between sequences of FIG. 6a. Thus the sequence most correlated with the sequence S4 is the sequence S3 (value 0.85), which is allocated to the view V3, which is thus directly predicted from the sequence S4 in the coding structure. In general terms, the sequence most similar to the sequence S4 is sought according to the previously calculated similarity values.

Next, the sequence most correlated with S3 is sought and it is the sequence S2 (value 0.82) that is attributed to the view V5. Finally, the sequence amongst the non-allocated sequences that maximizes the correlation with the sequence S4 (the principal view) is the sequence S5 (value 0.83), that will be attributed to the view V2. The remaining sequence will be attributed to the view V4. The result of this example is illustrated in FIG. 7a.

According to the second coding mode envisaged, the choice of the allocation of all the sequences to the various views is guided by the global optimization criterion for the coding of the sequence. In this case, the principal view is chosen as being the sequence that has the highest intercorrelation value with the other sequences of all the sequences to be coded. The table in FIG. 6b is used to determine this sequence. In the non-limiting example illustrated, it is the sequence S3 that maximizes the sum of the intercorrelations (total value equal to 3.89 according to the example in FIG. 6b), which is then chosen as the principal view. In general terms, the sequence for which the sum of the similarity values with all the other sequences is maximum is sought.

Next the sequence most correlated with S3, that is to say S4 (value 0.85), is associated with the view V3, and then the not yet attributed sequence most correlated with S4, that is to say S5 (value 0.83) is associated with V5. Finally, there is sought, among the non-attributed sequences, the one that has the maximum intercorrelation with the sequence S3 and it is the sequence S1 (value 0.65) that is attributed to the view V2. There remains finally the sequence S2, which is associated with the view V4. These results are illustrated in FIG. 7b.

A device able to implement the method of the invention is illustrated in FIG. 8. The device 100 is for example a microcomputer, a workstation or a personal assistant.

The device 100 comprises a communication interface 118 connected to the communication network 120 able to transmit coded numerical data processed by the device. The device 100 also comprises a storage means 112 such as, for example, a hard disk. It also comprises a drive 114 for a disk 116. This disk 116 can be a diskette, a CD-ROM or a DVD-ROM for example. The disk 116, like the disk 112, can contain data to be processed according to the invention, for example a set of digital video sequences, as well as the program or programs implementing the invention which, once read by the device 100, will be stored on the hard disk 112. According to a variant, the program Prog enabling the device to implement the invention can be stored in read-only memory 104 (called ROM in the drawing). In a second variant, the program can be received so as to be stored in an identical fashion to that described previously by means of the communication network 120.

According to a variant, the device 100 can be connected to one or preferably several image acquisition devices such as the digital camera 101, which make it possible to acquire the data to be processed according to the invention.

This same device optionally possesses a screen 108 making it possible in particular to display the data processed or to serve as an interface with the user, who can thus parameterize the coding, for example in order to select the coding mode and if applicable the principal view chosen, by means of the keyboard 110 or any other pointing means, such as for example a mouse 111, an optical pen or a touch screen.

The central unit 103 (called CPU in the drawing) executes the instructions relating to the implementation of the invention, instructions stored in the read-only memory 104 or in the other storage elements. On powering up, the processing programs stored in a non-volatile memory, for example the ROM 104, are transferred into the random access memory RAM 106, which will then contain the executable code of the invention as well as registers for storing the variables necessary for implementing the invention.

In more general terms, an information storage means, able to be read by a computer or by a microprocessor, integrated or not into the device, possibly removable, stores a program implementing the method according to the invention.

The communication bus 102 affords communication between the various elements included in the device 100 or connected to it. The representation of the bus 102 is not limiting and in particular the central unit 103 is able to communicate instructions to any element of the device 100 directly or by means of another element of the device 100.

Claims

1. Method of coding a plurality of digital image signals in a bitstream, said signals each corresponding to a view of a scene, said coding method comprising the use of a pre-determined predictive coding structure, the views being associated with input views in the coding structure, characterized in that it comprises steps of: for at least one pair of views of the scene, determining a spatial intersection zonecalculating a value representing the similarity between the signals corresponding to said views, according to at least some of the signals contained in said spatial intersection zone, andassociating at least one view with an input view of the predictive coding structure according to at least one calculated similarity value.
2. Method according to claim 1, wherein the determination and calculation steps are performed for all the pairs of views of the scene.
3. Method according to claim 2, wherein, in the step of associating at least one view with an input view of the predictive coding structure, said association is performed according to all the values representing the similarity calculated.
4. Method according to one of claims 1 to 3, wherein each image signal is obtained by a distinct imaging device, and in which at least one parameter of said imaging devices is as used in the step of determining a spatial intersection zone.
5. Method according one of claims 1 to 3, wherein the step of determining a spatial intersection zone comprises a step of adjustment between at least two images of signals corresponding to the views of the pair of views.
6. Method according to any one of claims 1-3, wherein, in the association step, a view is associated with a input view of the predictive coding structure according to the value representing the similarity of the image signal corresponding to said input view with an image signal corresponding to a view previously associated with another input view of the predictive coding structure.
7. Method according to claim 3, wherein it comprises a prior step of obtaining a method of selecting the first input view, referred to as the principal view, of the predictive coding structure.
8. Method according to claim 7, wherein the method of selecting the principle view is selection by a user.
9. Method according to claim 7, wherein, when the bitstream is accessible to a plurality of client entities each able to chose a principal view, the method of selecting the principal view is selection according to the majority choice of the client entities.
10. Method according to claim 7, wherein the method of selecting the principal view is selection according to a coding optimization criterion.
11. Method according to claim 10, wherein the principal view selected is the view corresponding to the image signal for which the sum of the values of similarity with all the other image signals is maximum.
12. Method according to any one of claims 1-3 and 8-11, wherein the value representing the similarity between signals is the intercorrelation between signals contained in the intersection zone.
13. Device for coding a plurality of digital image signals in a bitstream, said signals each corresponding to a view of a scene, said coding device being able to implement a coding method comprising the use of a predetermined predictive coding structure, the views being associated with input views in the coding structure, said device comprising: for at least one pair of views of the scene,means of determining a spatial intersection zonemeans of calculating a value representing the similarity between the signals corresponding to the said views, according to at least some of the signals contained in the said spatial intersection zone,means of associating at least one view with an input view of the predictive coding structure according to at least one calculated similarity value.
14. Information medium that can be read by a computer system, such as a hard disk or a diskette, or a transmissible medium such as an electrical or optical signal, said information medium containing instructions of a computer program for implementing a method of coding a plurality of digital image signals according to claim 1 when this program is loaded into and executed by a computer system.
15. Information medium that can be read by a totally or partially removable computer system, in particular a CD-ROM or a magnetic medium such as a hard disk or a diskette, or a transmissible medium such as an electrical or optical signal, said information medium containing instructions of a computer program for implementing a method of coding a plurality of digital image signals according to claim 1 when this program is loaded into and executed by a computer system.
16. Computer program stored on an information medium, said program containing instructions for implementing a method of coding a plurality of digital image signals according to claim 1 when this program is loaded into and executed by a computer system.

Priority Claims (1)

Number	Date	Country	Kind
0654347	Oct 2006	FR	national

METHOD AND DEVICE FOR CODING IMAGES REPRESENTING VIEWS OF THE SAME SCENE

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims

Priority Claims (1)