The present invention relates to a method and an apparatus for computing a synthesized picture of a visual scene, in particular in the field of computer vision, three-dimensional (3D) video processing and 3D video synthesis.
3D video synthesis is utilized in applications that require rendering of a virtual view. This includes applications like Free-viewpoint Television (FTV) where the viewpoint can be selected according to the preferences of a viewer or, e.g., 3D video coding of a multiview video where some of the views are synthesized from others, increasing the compression of such content by limiting the number of views to be transmitted. In 3D video, view synthesis is a process of creating a virtual view based on the available reference views (physical views, herein also referred as texture views) from which the visual scene was acquired.
The most commonly used approach of view synthesis is so called Depth Image-Based Rendering (DIBR) method described by [L. McMillan, “An image-based approach to three-dimensional computer graphics”, Doctoral thesis, University of North Carolina, Chapel Hill, USA, April 1997] that utilizes depth maps defining the distance of scene points from the viewpoint in order to project the corresponding texture information into virtual view position.
A depth map can be defined as information describing the distance of each part of the visual scene, e.g., represented in form of a grayscale image; alternatively a disparity map can be used, which values are inverse-proportional to the ones represented by the depth map. Among DIBR-based synthesis methods two main approaches can be distinguished: forward- and backward-projection algorithms. In forward-projection, coordinates of each sample in the picture from the reference view are projected onto a synthesized picture, resulting in non-integer sample coordinates. This means that actual values of samples in the synthesized picture must be somehow estimated from the closest samples projected from the reference view. On the other hand, in backward-projection approach, coordinates of each sample in the synthesized picture are projected into a reference picture. Consequently, the value of the sample is determined based on the samples in the picture from the reference picture that are close to the position of this projected sample. Both methods differ in aspects like disoccluded area detection and handling, possibility of simultaneous scanning of more than one reference view to produce the synthesized output picture or required picture interpolation methods.
In the following sections, it is assumed that reference views are aligned horizontally, i.e., left and right reference views can be distinguished that are displaced in the horizontal direction. The most efficient state-of-the-art virtual view synthesis algorithms based on the DIBR methods rely on forward-projection algorithms that combine pictures synthesized from a left and right reference pictures in order to produce the synthesized output picture. The current state-of-the-art solution—the High Efficiency Video Coding (HEVC) Test Model adopted into Joint Collaborative Team on 3D Video (JCT-3V) (joint Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T)/Moving Picture Experts Group (MPEG) standardization effort for 3D) as described by [H. Schwarz, K. Wegner, “Test Model under Consideration for HEVC based 3D video coding”, MPEG Doc. m12350, November 2011] uses a synthesis algorithm 100 as presented in
In the algorithm 100 left and right view textures sT,l and sT,r and depths sD,l and sD,r are used to perform the forward-projection step 101a, 101b. In this step, samples from reference views are projected into a synthesized view using the DIBR algorithm. The outputs of this step are: sT,l′, sT,r′, sD,l′ and sD,r′ pictures and left and right texture and depths projected into the synthesized view. At the same time, the algorithm detects disoccluded areas in the output sT,l′, sT,r′, sD,l′ and sD,r′ pictures. These areas are represented in form of filling masks: sF,l′ and sF,r′, identifying areas of synthesized pictures from each reference view that need to be filled. The further step is a reliability map creation 105a, 105b in which sF,l′ and sF,r′ filling masks are modified to produce sR,l′ and sR,r′ reliability maps. Each reliability map specifies the contribution of every sample in the picture synthesized from the reference picture to the final value of the sample in the output picture based on an estimated probability that the value of the sample is correct. As a consequence, the values of the reliability maps can be manipulated to reduce synthesis artifacts, e.g., at the borders of the disoccluded area. In the prior art, reliability maps are adopted to reduce synthesis artifacts that result from inconsistency of texture and depth borders in a single reference view. The solution is applied to background areas neighboring the disoccluded area of the visual scene. The procedure was proposed in [“Description of 3D Video Technology Proposal by Fraunhofer HHI (HEVC compatible; configuration A)”, MPEG m22570, November 2011] and can be described as follows. For each background pixel neighboring with the disoccluded area border within a pre-defined interval, called a transition region 201, the reliability of the pixel 200 decreases according to a linear function 203 as can be seen in
Before the final step of combining 107 the synthesized pictures from left and right reference pictures, a plane discrimination map between left and right view sP,lr′ is calculated 103, based on pre-defined criteria. For the purpose of the combination step 107, a sample from each of the reference views is compared with its corresponding sample from the other available reference view in order to determine if it belongs to the same plane of the visual scene or not. If a sample at the pixel position (x,y) synthesized from one reference view is much closer to the camera than the one in the same pixel position but synthesized from the other reference view, both samples are marked to belong to different planes of the visual scene. The decision if one sample is much closer to the camera than the other is made by comparing the difference between depth values assigned to both samples with a pre-defined threshold. The combination step 107 uses weighted averaging with reliability maps as weights for each combined sample to calculate the value of the synthesized output sample:
v(x,y)=wl(x,y)vl(x,y)−wr(x,y)vr(x,y)
where:
v(x,y) denotes the value of sample in the synthesized picture at position (x,y),
vl(x,y) denotes the value of sample in picture synthesized from left reference picture at position (x,y),
vr(x,y) denotes the value of sample in picture synthesized from right reference picture at position (x,y),
wl(x,y) denotes the weight of vl(x,y) sample,
wr(x,y) denotes the weight of vr(x,y) sample.
However, in case of two samples belonging to different planes of the visual scene, the value of the output sample is calculated based on the value of only one of the input samples, that is, the one that is closer to the camera. The decision is made based on the plane discrimination map sP,lr′ and the depth maps sD,l′ and sD,r′.
In DIBR methods used for virtual view synthesis inter-view inconsistency between depth maps of the different reference views may cause scene objects synthesis artifacts in form of an additional object border, if more than one reference view is used in the combination step to produce the synthesized output picture. An overcome to this problem is to use inter-view consistent depth maps. However, in existing multi-camera scenarios, the estimation or acquisition of inter-view consistent depth maps is very difficult or even practically unachievable with current technologies. The main problems are the large computational complexity, extreme difficulties to be achieved in a fully-automatic way due to errors in disparity estimation between two views. Semi-automatic or manual methods can solve the problem, but they are not always applicable.
It is the object of the invention to provide an improved technique for 3D view synthesis.
This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
The invention is based on the finding that an improved technique for 3D view synthesis that minimizes the synthesis artifacts for objects in the scene produced by weighted averaging of pictures synthesized from reference pictures can be provided by weighted averaging of pictures synthesized from reference pictures by reducing the weights for samples neighboring to the object borders in case the object borders in reference pictures are not aligned. Further improvement can be provided by appropriate specification of the conditions for applying the weights reduction and pattern to modify the weights.
In order to describe the invention in detail, the following terms, abbreviations and notations will be used.
According to a first aspect, the invention relates to a method for computing a synthesized picture of a visual scene, based on a left depth map of a left reference view of the visual scene and a right depth map of a right reference view of the visual scene, the method comprising projecting the left depth map into a left projected depth map and projecting the right depth map into a right projected depth map, and determining a left disoccluded area in the left projected depth map and a right disoccluded area in the right projected depth map; detecting object border misalignments between the left projected depth map and the right projected depth map; determining a left reliability map information based on the left disoccluded area, and the detected object border misalignments, and determining a right reliability map information based on the right disoccluded area, and the detected object border misalignments; and computing the synthesized picture by merging a left projected picture of the left reference view and a right projected picture of the right reference view using the left and right reliability map information.
By detecting object border misalignments between the left projected depth map and the right projected depth map and determining the left reliability map information based on the left disoccluded area and the detected object border misalignments, and determining the right reliability map information based on the right disoccluded area and the detected object border misalignments, the view synthesis errors resulting from inaccurate depth or disparity estimation between the two views can be reduced.
In a first possible implementation form of the method according to the first aspect, determining the left reliability map information and the right reliability map information comprises determining the left reliability map information based on the left disoccluded area and the right reliability map information based on the right disoccluded area; and modifying the left reliability information and/or the right reliability map information when object border misalignments between the left projected depth map and the right projected depth map are detected.
By modifying the left reliability information and/or the right reliability map information in case of object border misalignment detection, quality of the synthesized picture can be improved.
In a second possible implementation form of the method according to the first implementation form of the first aspect, determining a plane discrimination map between the left projected depth map sD,r′ and the right projected depth map based on the left projected depth map and the right projected depth map; determining a left plane discrimination map for the left projected depth map based on the left projected depth map; and determining a right plane discrimination map for the right projected depth map based on the right projected depth map; wherein determining the left reliability map information is based on the left plane discrimination map and on the plane discrimination map, and determining the right reliability map information is based on the right plane discrimination map and on the plane discrimination map.
By determining the left and the right reliability maps based on the plane discrimination maps, view synthesis errors resulting from inaccurate depth or disparity maps can be reduced.
In a third possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, detecting object border misalignments comprises detecting whether samples in one of the left projected depth map and right projected depth map belong to an object border and at the same positions belong to a foreground plane in the other projected depth map.
By detecting the object border misalignment, visible and annoying view synthesis artifacts resulting from inaccurate and inter-view consistent depth or disparity maps can be significantly reduced.
In a fourth possible implementation form of the method according to first aspect as such or any of the implementation forms of the first aspect, the object border misalignment is detected if samples in a first of the left projected depth map and right projected depth map belong to an object border and at the same positions belong to a foreground plane in the other second projected depth map of the left projected depth map and right projected depth map; wherein determining the left reliability map information comprises assigning a reduced weight for samples in the left projected picture for the computing of the synthesized picture if the samples in the left projected depth map belong to an object border and at the same positions belong to a foreground plane in the right projected depth map; and/or wherein determining the right reliability map information comprises assigning a reduced weight for samples in the right projected picture for the computing of the synthesized picture if the samples in the right projected depth map belong to an object border and at the same positions belong to a foreground plane in the left projected depth map.
By reducing the weights of the one of the left and right reliability maps corresponding to the one of the left and right projected pictures in which the influence is suppressed, synthesis artifacts can be reduced.
In a fifth possible implementation form of the method according to the fourth implementation form of the first aspect, the reduced weights are assigned according to a monotonically increasing or decreasing function over a transition region determined based on the positions of the samples belonging to the object border.
Reducing the weights according to a monotonically increasing or decreasing function is easy to implement, e.g., by a lookup table.
In a sixth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, determining the left reliability map information comprises assigning a reduced weight for samples in the left projected picture for the computing of the synthesized picture, if a first sample in the left projected depth map at a first position does not belong to the left disoccluded area, a second right neighboring sample to the first sample in the left projected depth map belongs to the left disoccluded area, the first sample in the left projected depth map and a first sample in the right projected depth map at the first position belong to a same plane of the visual scene, and the first sample in the right projected depth map and a second right neighboring sample to the first sample in the right projected depth map belong to the same plane of the visual scene.
Reducing the weights of the left reliability map information in such a way can be easily implemented using logical operations. No complex computational processing is required.
In a seventh possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, assigning a reduced weight for samples in the left projected picture for the computing of the synthesized picture, if a first sample in the left projected depth map at a first position and a second left neighboring sample to the first left sample in the left projected depth map do not belong to a same plane of the visual scene, a point in the visual scene corresponding to the first sample in the left projected depth map is closer to a camera than a point in the visual scene corresponding to the second left neighboring sample in the left projected depth map, the first sample in the left projected depth map and a first sample in the right projected depth map at the first position belong to a same plane of the visual scene, and the first sample in the right projected depth map and a second left neighboring sample to the first sample in the right projected depth map belong to the same plane of the visual scene.
Reducing the weights of the left reliability map information in such a way can be easily implemented using logical operations. No complex computational processing is required.
In an eighth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, object border misalignments are detected, and determining the right reliability map information comprises assigning a reduced weight for samples in the right projected picture for the computing of the synthesized picture, if a first sample in the right projected depth map at a first horizontal and a first vertical position does not belong to the right disoccluded area, a second left neighboring sample to the first sample in the right projected depth map belongs to the right disoccluded area, the first sample in the right projected depth map and a first sample in the left projected depth map at the first horizontal and the first vertical position belong to a same plane of the visual scene, and the first sample in the left projected depth map and a second left neighboring sample to the first sample in the left projected depth map belong to the same plane of the visual scene.
Reducing the weights of the right reliability map information in such a way can be easily implemented using logical operations. No complex computational processing is required.
In a ninth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, determining the right reliability map information comprises assigning a reduced weight for samples in the right projected picture for the computing of the synthesized picture, if a first right sample in the right projected depth map at a first horizontal and a first vertical position and a second right neighboring sample to the first sample, in the right projected depth map do not belong to a same plane of the visual scene, a point in the visual scene corresponding to the first sample in the right projected depth map is closer to a camera than a point in the visual scene corresponding to the second right neighboring sample in the right projected depth map, the first sample in the right projected depth map and a first sample in the left projected depth map at the first horizontal and the first vertical position belong to a same plane of the visual scene, and the first sample in the left projected depth map and a second right neighboring sample to the first sample in the left projected depth map belong to the same plane of the visual scene.
Reducing the weights of the right reliability map information in such a way can be easily implemented using logical operations. No complex computational processing is required.
In a tenth possible implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the merging the left and right projected pictures comprises weighting a sample in the left projected picture by the weight of the left reliability map and weighting a sample in the right projected picture by the weight of the right reliability map.
When the merging the left and right projected pictures is applied on the modified weights, object synthesis artifacts can be reduced.
In an eleventh possible implementation form of the method according to the tenth implementation form of the first aspect, the method comprises combining the weighted sample in the left projected picture and the weighted sample in the right projected picture to obtain a sample in the synthesized picture.
Combining the weighted samples can be easily performed, e.g., by using a simple addition operation.
In a twelfth possible implementation form of the method according to the eleventh implementation form of the first aspect, in case of a sample in the left projected picture and a sample in the right projected picture belong to different planes of visual scene, the sample in the synthesized picture is calculated based on only the one of the sample in the left projected picture and the sample in the right projected picture, which belongs to the closer plane.
By calculating the sample in the synthesized picture based on only one of the sample in the left projected picture and the sample in the right projected picture, the influence of the errors in the depth or disparity estimation to the view synthesis can be reduced. By using the sample which is located closer to a camera position, the reliability of the sample in the synthesized picture is increased.
In a thirteenth possible implementation form of the method according to the first aspect as such or any of the implementation forms of the first aspect, the left and right projected pictures are projected texture pictures, the left and right projected pictures are the projected depth map pictures), or the left and right projected pictures are projected disparity pictures.
In a fourteenth possible implementation form of the method according to the first aspect as such or any of the implementation forms of the first aspect, the left depth map of the left reference view of the visual scene is a left disparity map of the left reference view of the visual scene, and the right depth map of the right reference view of the visual scene is a right disparity map of the right view of the visual scene; and wherein the left projected depth map is a left projected disparity map and the right projected depth map is a right projected disparity map.
According to a second aspect, the invention relates to computer program for performing the method of the first aspect as such or any of the implementation forms according to the first aspect, when executed on a processor or computer.
According to a third aspect, the invention relates to computer program product comprising a computer readable storage medium storing program code thereon for use by a programmable processor or computer system, the program code comprising instructions for executing a method according to the first aspect as such or any of the implementation forms of the first aspect.
The computer program or program code can be provided in form of a source code or machine-readable code, e.g., as firmware, software or any combination thereof.
The computer program can be provided on a digital storage medium, for example a hard disc, compact disc (CD), digital versatile disc or digital video disc (DVD) or Blu-ray disc, having an electronically readable control signal stored thereon, which co-operates with the programmable processor or programmable computer system such that a method according to the first aspect as such or any of its implementation forms is performed, Alternatively the computer program or program code can be provided by downloading via a network.
According to a fourth aspect, an apparatus comprising a processor configured to perform the method according to the first aspect as such or any of the implementation forms of the first aspect is provided.
The methods, systems and devices described herein may be implemented as software in a Digital Signal Processor (DSP), in a micro-controller or in any other side-processor or as hardware circuit within an application specific integrated circuit (ASIC).
The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof, e.g., in available hardware of conventional mobile devices or in new hardware dedicated for processing the methods described herein.
Further embodiments of the invention will be described with respect to the following figures, in which:
Equal or equivalent elements are denoted in the following description of the figures by equal or equivalent reference signs.
The method 300 computes a synthesized picture, for example a synthesized texture picture sT′ as shown in
Projecting 301 the left depth map sD,l into a left projected depth map sD,l′ and projecting the right depth map sD,r into a right projected depth map sD,r′, and determining a left disoccluded area sF,l′ in the left projected depth map sD,l′ and a right disoccluded area sF,r′ in the right projected depth map sD,r′.
Detecting 302 object border misalignments between the left projected depth map sD,l′ and the right projected depth map sD,r′.
Determining 303 a left reliability map information sR,l′ based on the left disoccluded area sF,l′, and the detected object border misalignments, and determining a right reliability map information sR,r′ based on the right disoccluded area sF,r′, and the detected object border misalignments.
Computing 307 the synthesized picture sT′ by merging a left projected picture sT,l′ of the left reference view and a right projected picture sTr′ of the right reference view using the left sR,l′ and right sR,r′ reliability map information.
In an implementation, determining 303 the left reliability map information sR,l′ and the right reliability map information sR,r′ comprises the following. Determining the left reliability map information sR,l′ based on the left disoccluded area sF,l′ and the right reliability map information sR,r′ based on the right disoccluded area sF,r′. Modifying the left reliability map information sR,l′ and/or the right reliability map information sR,r′ when object border misalignments between the left projected depth map and the right projected depth map are detected.
In an implementation, the method 300 further comprises the following. Determining a plane discrimination map sP,lr′ between the left projected depth map sD,r′ and the right projected depth map sD,r′ based on the left projected depth map sD,l′ and the right projected depth map sD,r′. Determining a left plane discrimination map sP,ll′ for the left projected depth map sD,l′based on the left projected depth map sD,l′. Determining a right plane discrimination map sP,rr′ for the right projected depth map sD,r′ based on the right projected depth map sD,r′, wherein determining 303 the left reliability map information sR,l′ is based on the left plane discrimination map sP,ll′ and on the plane discrimination map sP,lr′, and determining the right reliability map information sR,r′ is based on the right plane discrimination map sP,rr′ and on the plane discrimination map sP,lr′.
In an implementation, detecting 302 object border misalignments comprises detecting whether samples in one of the left projected depth map sD,l′ and right projected depth map sD,r′ belong to an object border and at the same positions (x,y) belong to a foreground plane in the other projected depth map.
In an implementation, an object border misalignment is detected if samples in a first of the left projected depth map sD,l′ and right projected depth map sD,r′ belong to an object border and at the same positions (x,y) belong to a foreground plane in the other second projected depth map of the left projected depth map sD,l′ and right projected depth map sD,r′; wherein determining 303 the left reliability map information sR,l′ comprises assigning a reduced weight for samples in the left projected picture sT,l′ for the computing of the synthesized picture sT′ if the samples in the left projected depth map sD,l′ belong to an object border and at the same positions (x,y) belong to a foreground plane in the right projected depth map sD,r′; and/or wherein determining 303 the right reliability map information sR,r′ comprises assigning a reduced weight for samples in the right projected picture sT,r′ for the computing of the synthesized picture sT′ if the samples in the right projected depth map sD,r′ belong to an object border and at the same positions (x,y) belong to a foreground plane in the left projected depth map sD,l′.
In an implementation, the reduced weights are assigned according to a monotonically increasing or decreasing function (603) over a transition region (601) determined based on the positions of the samples belonging to the object border as described below with respect to
In an implementation as described below with respect to
In an implementation as described below with respect to
In an implementation as described below with respect to
In an implementation as described below with respect to
In an implementation, merging the left sT,l′ and right sT,r′ projected pictures comprises weighting a sample vl(x,y) in the left projected picture sT,l′ by the weight of the left reliability map sR,l′ and weighting a sample vr(x,y) in the right projected picture sT,r′ by the weight of the right reliability map sR,r′.
In an implementation, the method 300 comprises combining the weighted sample vl(x,y) in the left projected picture sT,l′ and the weighted sample vr(x,y) in the right projected picture sT,r′ to obtain a sample v(x,y) in the synthesized picture.
In an implementation, in case a sample vl(x,y) in the left projected picture sT,l′ and a sample vr(x,y) in the right projected picture sT,r′ belong to different planes of the visual scene, the sample v(x,y) in the synthesized picture is calculated based only on the sample vl(x,y) in the left projected picture sT,l′ or the sample vr(x,y) in the right projected picture sT,r′, which belongs to the closer plane.
In an implementation, the sample v(x,y) in the synthesized picture is calculated based on the one of the sample vl(x,y) in the left projected picture sT,l′ and the sample vr(x,y) in the right projected picture sT,r′ which sample is closer to a camera.
Implementation forms may be adapted to compute a synthesized texture picture sT′, as synthesized picture, as for example depicted in
In further implementation forms of the method 300, disparity maps, which are depth maps with inverse values, are used instead of the depth maps as such. Accordingly, the left depth map sD,l of the left reference view of the visual scene is a left disparity map sD,l of the left reference view of the visual scene, and the right depth map sD,r of the right reference view of the visual scene is a right disparity map of the right view of the visual scene; and wherein the left projected depth map sD,l′ is a left projected disparity map and the right projected depth map sD,r′ is a right projected disparity map.
Inter-view inconsistency between depth maps of the different reference views 401, 403 may cause a misalignment between object borders in pictures synthesized or projected from left and right reference pictures: sT,l′ and sT,r′. sT,l′ is also referred to as left projected picture of the left reference view, and sT,r′ is also referred to as right projected picture of the right reference view. As sT,l′ and sT,r′ pictures are further used in the combination step 307 of the method 300 for computing the synthesized picture 405 as described above with respect to
In order to minimize this effect, the method 300 applies border misalignment detection for objects in the analyzed visual scene and suppresses the influence of samples in one of the sT,l′ or sT,r′ pictures in which the border 411 of the object corresponds to the area marked as foreground in the other picture. Samples from such a picture are assigned a smaller reliability in order to minimize their impact during the weighted averaging in the combination step for obtaining the synthesized output picture 405.
The following notation is applied: vl(x,y) denotes a sample in the picture synthesized from the left reference picture sT,l′ at position (x,y). vr(x,y) denotes a sample in the picture synthesized from the right reference picture sT,r′ at position (x,y).
In an implementation form, the conditions to determine if the modification of the reliability map sR,l′ or sR,r′ at position (x,y) is being applied are as follows:
For the left reliability map sR,l′ as depicted in
Case 1: (modification or assignment of a reduced value is applied only if all of the conditions are fulfilled), see
a. Sample vl(x,y) does not belong to disoccluded area.
b. Right neighboring sample vl(x+1,y) belongs to disoccluded area.
c. Samples vl(x,y) and vr(x,y) belong to the same plane of the visual scene.
d. Samples vr(x,y) and vr(x+1,y) belong to the same plane of the visual scene.
Case 2: (modification or assignment of a reduced value is applied only if all of the conditions are fulfilled), see
a. Left neighboring sample vl(x−1,y) does not belong to the same plane of the visual scene as the sample vl(x,y).
b. The point in the visual scene correspondent to the sample vl(x,y) is closer to the camera than the one represented by the left neighboring sample vl(x−1,y).
c. Samples vl(x,y) and vr(x,y) belong to the same plane of the visual scene.
d. Samples vr(x−1,y) and vr(x,y) belong to the same plane of the visual scene.
For the right reliability map as depicted in
Case 1: (modification or assignment of a reduced value is applied only if all of the conditions are fulfilled), see
a. Sample vr(x,y) does not belong to disoccluded area.
b. Left neighboring sample vr(x−1,y) belongs to disoccluded area.
c. Samples vr(x,y) and vl(x,y) belong to the same plane of the visual scene.
d. Samples vl(x−1,y) and vl(x,y) belong to the same plane of the visual scene.
Case 2: (modification or assignment of a reduced value is applied only if all of the conditions are fulfilled), see
a. Right neighboring sample vr(x−1,y) does not belong to the same plane of the visual scene as the sample vr(x,y).
b. The point in the visual scene correspondent o the sample vr(x,y) is closer to the camera than the one represented by the right neighboring sample vr(x−1,y).
c. Samples vr(x,y) and vl(x,y) belong to the same plane of the visual scene.
d. Samples vl(x,y) and vl(x−1,y) belong to the same plane of the visual scene.
Information if the sample belongs to the disoccluded area or not is determined in the projection step 301 in which samples from the reference picture are projected into the synthesized picture. Such information, e.g., left and right disoccluded areas sF,l′ and sF,r′ are usually represented in form of binary masks, e.g., in left and right filling masks sF,l′ and sF,r′ according to the HEVC Test Model as described above.
The decision if the two samples belong to the same plane of the visual scene is made based on plane discrimination criteria. For that purpose, in an implementation form, plane discrimination criteria introduced in the prior art are used. Consequently, in case of the samples located at the same position (x,y) but belonging to different views, i.e., vl(x,y) and vr(x,y), the decision can be made based on the plane discrimination map sP,lr′ calculated already in the plane discrimination step according to the prior art synthesis algorithm. On the other hand, for neighboring samples from the same view, e.g., vl(x,y) and vl(x−1,y), the same plane discrimination criteria is used, however, the input of the decision function is only one depth map of the analyzed view (sD,l′ or sD,r′) and, consequently, a plane discrimination map is computed independently for each view, producing plane discrimination maps for left and right view: sP,ll′ and sP,rr′, also referred to as left and right plane discrimination maps sP,ll′ and sP,rr′.
Also, a distance of the point in the visual scene correspondent to each sample is determined based on the corresponding depth map, and the left projected depth map sD,l′ is used for the modification or assignment of reduced values of the left reliability map sR,l′, and the right projected depth map sD,r′ is used for the modification or assignment of reduced values of the right reliability map sR,r′. In an alternative implementation form, in any case of utilization of depth maps, disparity maps are used for the same purpose.
The modification or assignment of reduced values of the reliability map sR,l′ or sR,r′ according to the specified pattern is applied to all neighboring samples within the defined transition region ΔTR 601 if the appropriate above described conditions for sample at position (x,y) are fulfilled.
A. For the sR,l′ reliability map:
Case 1: samples within range [x−ΔTR,x] are modified: Rmin reliability is assigned to sR,l′ at position (x,y) and Rmax reliability is assigned to sR,l′ at position (x−ΔTR,y).
Case 2: samples within range [x,x+ΔTR] are modified: Rmin reliability is assigned to sR,l′ at position (x,y) and Rmax reliability is assigned to sR,l′ at position (x+ΔTR,y)
B. For the reliability map:
Case 1: samples within range [x,x+ΔTR] are modified: Rmin reliability is assigned to sR,l′ at position (x,y) and Rmax reliability is assigned to sR,l′ at position (x+ΔTR,y).
Case 2: samples within range [x−ΔTR,x] are modified: Rmin reliability is assigned to sR,l′ at position (x,y) and Rmax reliability is assigned to sR,l′ at position (x−ΔTR,y)
In the above description Rmin and Rmax are defined accordingly: minimum and maximum reliability values that are assigned to the samples of the reliability map within the transition region 601.
In an implementation form, the pattern specifying the values of the reliability map sR,l′ or sR,r′ inside the transition region 601 is any monotonically increasing function 603, which values are:
The first sample in the transition region 601 is the sample at position (x,y) for which the conditions for modifying the reliability map are fulfilled. The coordinates of the last sample in the transition region 601 are consequently equal to (x−ΔTR,y) or (x+ΔTR,y) depending on the case for which border misalignment was detected. ΔTR denotes the width of the transition region 601.
In an implementation form, other steps of view synthesis are performed as in the prior art described above with respect to
The computing the synthesized picture sT′ of a visual scene is starting from left sT,l and right sT,r reference pictures and their corresponding left sD,l and right sD,r depth maps. The apparatus 700 comprises a projector 701 configured for projecting the left reference picture sT,l into a left projected picture sT,l′ and projecting the right reference picture sT,r into a right projected picture sT,r′ and determining a left disoccluded area sF,l′ in the left projected picture sT,l′ and a right disoccluded area sF,r′ in the right sT,r′ projected picture. The apparatus 700 comprises a determiner 705 configured for determining a left reliability map sR,l′ based on the left disoccluded area sF,l′ and a right reliability map sR,r′ based on the right disoccluded area sF,r′. The apparatus 700 comprises a modifier 805, see
In an implementation form, the apparatus 700 comprises a plane discriminator 703 configured for receiving outputs of the projector 701 and providing outputs to the processor 707.
In an implementation form, the projector 701, the determiner 705, the processor 707 and the plane discriminator 703 are functionally specified according to the description below:
In Blocks 701a, 701b, left and right reference pictures are projected into the synthesized picture, and disoccluded areas are detected. In block 703, plane discrimination map between left and right picture is calculated. In blocks 705a, 705b, the information received from blocks 701a, 701b and block 703 is combined to build up a reliability map. In an implementation, the reliability map is built up as in the prior art as described above with respect to
The following symbols are used in
In an implementation form, the reliability map creation block 705 is functionally specified according to the following description: In block 801, for every sample of an input disoccluded area, e.g., input filling mask or, a reliability weight is computed according to conventional algorithms as described above with respect to the description of
Implementation forms may be adapted to first determine the left and right reliability maps or reliability map information according to conventional algorithms and afterwards to modify the left and right reliability maps or reliability map information, e.g., reduce the weights for the corresponding samples, when object border misalignments between the left and right projected depth map have been detected, as shown in
Implementation forms may be adapted to compute a synthesized texture picture sT′, as synthesized picture, as for example depicted in
In implementation forms for computing a synthesized texture picture sT′, the left and right projected pictures are projected texture pictures sT,l′, sT,r′ obtained from left and right reference texture pictures sT,l, sT,r by projection.
In implementation forms for computing a synthesized depth map picture sD′, the left and right projected pictures are projected depth map pictures sD,l′, sD,r′ obtained, for example from left and right reference depth map pictures sD,l, sD,r by projection, or from left and disparity map pictures by projection and inversion of the map values or vice versa.
In implementation forms for computing a synthesized disparity map picture, the left and right projected pictures are projected disparity map pictures obtained from left and right reference disparity map pictures by projection, or from left and depth map pictures by projection and inversion of the map values or vice versa.
Implementation forms may be adapted to determine the whole left and right reliability map before computing the synthesized picture or adapted, for example to determine only parts process entire pictures and corresponding maps or only those parts of left and right reliability map which are required for computing the corresponding part of the synthesized picture, i.e., implementation forms are adapted to determine left and right reliability map information.
From the foregoing, it will be apparent to those skilled in the art that a variety of methods, systems, computer programs on recording media, and the like, are provided.
The present disclosure also supports a computer program product including computer executable code or computer executable instructions that, when executed, causes at least one computer to execute the performing and computing steps described herein.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present inventions has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the inventions may be practiced otherwise than as specifically described herein.
This application is a continuation of International Application No. PCT/EP2013/059941, filed on May 14, 2013, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2013/059941 | May 2013 | US |
Child | 14937046 | US |