1. Field of the Invention
The present invention relates to a method for generating free viewpoint video image using divided local regions and more particularly to a method for generating a free viewpoint video image using images of an object photographed with a plurality of video cameras (hereinafter referred to as just camera) having each horizontal optical axis so as to surround the object.
2. Description of the Related Art
With a progress of video processing technology and video communication technology in recent years, three-dimensional free viewpoint video image has been attracting public attention as a next-generation video content. Then, a technology of generating entire circumference free viewpoint video image using multi-viewpoints photographed by disposing video cameras around an object has been studied and developed.
In this case, if cameras are disposed densely around an object in order to cope with an arbitrary viewpoint, the number of the cameras is increased thereby raising cost, which is not achievable in reality. Thus, although a method of disposing the cameras sparsely around the object is adopted, in this case, a video image located between video cameras is not obtained.
To solve this defect, conventionally, there has been proposed a method of generating a video image from a viewpoint in which an object is not photographed with the cameras by interpolating a video image between images using image based rendering.
As a typical method for the image based rendering of interpolating between multi-viewpoint images, “ray space representation method” is available, which has been described in Japanese Patent Application Laid-Open (JP-A) Nos. 2004-258775 and 10-111951 as technical documents which describe generation of interpolated video image using the ray space representation method.
However, when objects are distributed in a wide range in a real zone and parallax due to a difference in depth (depth parallax) is large, if it is intended to correct that depth parallax by the above-mentioned prior art, interpolation processing of ray information becomes complicated, which is not an actual solution.
Accordingly, the applicant has invented and filed an invention in which the real zone is divided into small local regions in which the depth parallax can be neglected and by applying the method of image based rendering to each divided local region individually, a local area free viewpoint video image is generated and then a target free viewpoint video image is generated by synthesizing those video images (Japanese Patent Application No. 2006-185648).
An outline of this invention will be described briefly with reference to
According to this invention, even if objects are distributed in a wide range of real zone so that parallax due to a difference in depth is large, a video image of an arbitrary viewpoint in the real zone near the reality can be generated. Further, a video image can be generated from the virtual viewpoint between an object and other object so as to achieve walk-through experience.
However, in the above-mentioned application invention, the position of the virtual viewpoint is confined to a plane where a camera is disposed and if the virtual viewpoint is not located on that plane, for example, if the virtual viewpoint is moved upward or downward in the vertical direction with respect to the plane, the free viewpoint video image cannot be generated. In other words, if the camera is moved in the vertical direction with respect to the plane, the free viewpoint video image cannot be generated.
An object of the present invention is to provide a method for generating free viewpoint video image, capable of generating a free viewpoint video image as seen from a virtual viewpoint if the virtual viewpoint is not located on a plane where a camera is disposed.
In order to accomplish the object, a feature of the present invention resides in that a free viewpoint video image generating method which generates a video image of an arbitrary viewpoint using video images of a object photographed by a plurality of cameras, each having a horizontal optical axis disposed to surround the object, comprises a first step of dividing a real zone to local regions, a second step of transforming a camera coordinate system of the camera using an internal parameter of the camera within each of the local regions so that the optical axis of the camera is directed to the local region, a third step of enlarging or contracting the video image within the local region using information of distance between each of the cameras and the object, so that the scales of the local regions on the video image are arranged to an identical one, a fourth step of generating a free viewpoint video image only within the local region using a method of image based rendering within each of the local regions, a fifth step of enlarging or contracting the free viewpoint video image within each of the local regions so that the local region on the video image is of a prescribed scale, a sixth step in which the coordinate system of each camera is transformed using an internal parameter of the camera so that the optical axis of the camera is directed to a prescribed one, so as to obtain the free viewpoint video image of the local region and a seventh step of integrating the free viewpoint video image of each of the local region, wherein to generate a free viewpoint video image, viewed from a virtual viewpoint not located on the plane where the camera is disposed, a processing of moving the position of the virtual viewpoint video image of a local region within the free viewpoint video image synthesized finally corresponding to the position of the virtual viewpoint is carried out in the seventh step or a processing of moving the position Q of information of ray to be read corresponding to the position of the virtual viewpoint is carried out in the fourth step. According to the method for generating the free viewpoint video image of the present invention, even if the virtual viewpoint is moved in any direction three-dimensionally in the method for generating the free viewpoint video image by dividing the local region, a video image from that virtual viewpoint can be generated, thereby a free viewpoint video image having a higher realistic sensation being produced.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Hereinafter, the present invention will be described in detail with reference to the drawings.
As shown in the
Next, the structure of a free viewpoint video image generating apparatus 10 of the first embodiment of the present invention will be described with reference to a block diagram of
First, the free viewpoint video image generating apparatus 10 includes a local region dividing portion 11 for dividing a real zone 1 (see
A local region selecting portion 14 selects one of local regions divided by the local region dividing portion 11. A sight line rotation transformation portion 15 obtains each camera image of the selected local region from the video image acquiring portion 13 and executes rotational coordinate transformation of video image coordinate system of each camera so that video image in the local region is located in the center of the video image. An enlargement/contraction portion 16 executes enlargement or contraction processing of a video image of each camera in order to arrange sizes of objects to a uniform one because the size of the object differs depending on each viewpoint as the distance from the camera to the object is different respectively.
A free viewpoint video image generating portion 17 generates a free viewpoint image only in the local region using a method of image based rendering. Further, a video image of a virtual viewpoint position is extracted and memorized. In the meantime, an outline of the operation of the free viewpoint video image generating portion 17 is represented in
An inverted enlargement/contraction portion 18 enlarges or contracts the free viewpoint video image so that the local region on the video image becomes a prescribed scale. An inverted sight line rotational transformation portion 19 transforms each coordinate system of the camera using the internal parameter of the camera so that each optical axis of the camera is directed to a prescribed direction. When the function of the inverted sight line rotational transformation portion 19 is executed, other local region is selected by the local region selecting portion 14 and the same function as described previously is repeated.
If virtual viewpoint videos of all local regions are obtained by the above-described operation, a local region synthesizing portion 20 synthesizes free viewpoint video images of virtual viewpoints of the plural local regions obtained by the above-described structure. In this case, according to the present embodiment, the local region synthesizing portion 20 executes calculation processing 20a for a travel distance of a display position of the free viewpoint video image and video image generating processing 20b by synthesized representation. The calculation processing 20a calculates the travel distance of the display position of the free viewpoint video image caused by moving the cameras 3a, 3b, 3c . . . 3n and/or the virtual viewpoint 3× vertically while the detailed processing of the processings 20a and 20b will be described later. In the video image generating processing 20b, the display position is moved upward or downward by an amount obtained by the calculation processing 20a to generate the free viewpoint video image and the generated video images are synthesized. A free viewpoint video image output portion 21 outputs a free viewpoint image (virtual viewpoint video) synthesized by the local region synthesizing portion 20.
Next, an outline of the function of the free viewpoint video image generating apparatus 10 will be described with reference to the flow chart of
In step S1, position information of all cameras and internal parameter of the camera, for example, information about pixel transformation amount f of focal length and virtual viewpoint position are inputted. Assume that respective cameras have the same internal parameter. In step S2, video images of all the cameras are acquired. In step S3, the real zone 1 is divided to local regions. In step S4, if the optical axis of some camera is not horizontal, rotational transformation of the video image is carried out so that the optical axis of that camera becomes horizontal. In step S5, a local region is selected. In the meantime, the processing of step S4 may be carried out after step S5.
In step S6, the video image coordinate system of each camera is subjected to rotational coordinate transformation so that the optical axes of all the cameras are directed to the selected local region. In other words, the video image is subjected to rotational transformation so that local region is located in the center of the video image. In step S7, the video image of each camera is enlarged or contracted so that the scale of the local regions on the video image are arranged to an identical one.
In step S8, a free viewpoint video image only within the local region is generated by executing interpolation processing using the method of image based rendering. The detail of this processing will be described with reference to
In step S9, a virtual viewpoint position video image is extracted from the free viewpoint video image and the extracted virtual viewpoint position video image is enlarged or contracted so that the local region on the video image is of the prescribed scale. In step S10, the video image coordinate system of each of the cameras is subjected to rotational coordinate transformation so that the optical axes of all the cameras are directed to a prescribed one. Consequently, the virtual viewpoint position video image of the selected local region is obtained.
Next, in step S11, whether or not the processing of all the local regions is terminated is determined and if there is some local regions not processed, the procedure is returned to step S5, in which another local region not processed is selected. If the processing to all the local regions is terminated in the above-described manner (the determination of step S11 is affirmative), the virtual viewpoint position video image of each local region is obtained. In next step S12, to obtain a free viewpoint video image in case where the camera is moved in a vertical direction, the display position of the virtual viewpoint position image is moved by a mount corresponding to the travel distance of the camera and then video images of all the local regions are synthesized. In step S13, the synthesized video image is outputted as a virtual viewpoint position video image.
Next, generation processing of the free viewpoint video image or virtual viewpoint position video image will be described in detail below.
First, the real zone 1 is divided to local regions 4a, 4b, 4c, 4d, . . . 4n as shown in
where a is (a) radius of a local region, (k) and (l) are arbitrary integers and (R) is a distance from the center to a camera.
As other example, considering a case where the cameras are disposed on a rectangular real zone 1, the coordinate of the center of each local region is preferred to be divided to local regions represented below.
where (a) is a radius of a local region, (k) and (l) are arbitrary integers, (H) is a width of a rectangular object area and (V) is a length of the rectangular object area. However, if (k) is odd, (l) is odd also, and if (k) is even, (l) is even also.
Unless the optical axis of the camera is not horizontal, the video image is subjected to rotational transformation so that the optical axis becomes horizontal.
If it is assumed that:
f: amount of pixel conversion of camera focal length [pixel]
digital video image coordinate of a point on original video image (home position or origin is a center of video image)
digital video image coordinate of a point on a transformation object video image (home position or origin is the center of video image)
φ: Angle of elevation of optical axis of camera, a following relation is established.
Here, (s) is scalar. As the above-mentioned equation, for example, it is permissible to use one described in “2.3 Projection Matrix and external variable” on page 187 of “Three-Dimensional Vision” written by Go Jo, Saburo Tsuji, published by KYORITSU SHUPPAN.
Next, a target local region (4m in
Here, assume that cameras are disposed on a circumference or on a rectangle around the real zone, N is a quantity of cameras, (n) is ID of the camera, Rn is a distance from the center to a nth camera and Θn is an azimuth angle of the optical axis of the nth camera. If the cameras are disposed on the circumference, Rn becomes a constant value (
Generally, the direction of the optical axis of the camera does not agree with the direction (direction of dotted line arrow of
If it is assumed that as shown in
Therefore,
Accordingly, if a video image is subjected to rotational transformation by an angle (θn−Θn) obtained from the above-mentioned equation, it comes that the local region of 4m is located in the center of a virtual circle l′ as shown in
Next, a method for obtaining a video image subjected to rotational transformation by the angle (θn−Θn) will be described below.
If it is assumed that:
digital video image coordinate of a point on original video image (home position is the center of video image)
digital video image coordinate of a point on a transformation object video image (home position is the center of video image), a following relation is established.
where (s) is a scalar. If the above-mentioned relational equation is solved (scalar (s) is deleted), the transformation equation for digital video image coordinate is as follows.
The video image is subjected to rotational transformation based on the above equation in order to generate a video image of the object which is moved up to the center of the video image.
Next, if the video image is only subjected to rotational transformation, the sizes of the objects at each viewpoint differ for the reason that a distance from the camera to the object is different. Then, the video image is enlarged or contracted with the home position thereof (center of the video image) fixed so as to arrange the sizes of the objects to an identical one.
Because the size of the object within the video image of each viewpoint is inversely proportional to a distance from the camera to the object, its enlargement/contraction ratio can be calculated according to a following equation.
If
is digital video image coordinate of a point on enlarged/contracted video image, a following equation is established.
If the above-mentioned video image is subjected to rotational transformation and enlarged or contracted, a following equation is obtained.
Next, a cylindrical record ray space is constructed using the transformed video image. First, the ray space will be described.
Consider a case where the real zone is divided to “described zone” and “visual zone” by a boundary surface S as shown in
Further, a projection method not depending on the shape of the boundary surface S can be defined. First, if an axis is placed in an advance direction of ray, it comes that changes accompanied by propagation of the ray is recorded along this axis. This axis is called R axis and as a position coordinate system containing the R axis instead of the XYZ coordinate system, PQR coordinate system is considered. Information of passage position of ray along the R axis can be described.
More specifically, as shown in
P=X cos θ−Z sin θ
Q=−X sin θ tan φ+Y−Z cos θ tan φ
By this transformation, it comes that information of orthographic projection of zone is recorded in the PQ plane. If P and Q of the transformation equation is used, five-dimensional ray space f (X, Y, Z, θ, φ) is transformed to four-dimensional ray space f (P, Q, θ, φ).
Video image information can be regarded as “collection of information of ray passing through a point in real zone”. Thus, to store information of video image photographed at a camera position (Xc, Yc, Zc) in ray space, ray information recorded in a region represented by equation f(θ, φ)|X=Xc, Y=Yc, Z=Zc is cut out. Photographing of the video image can be regarded as “sampling of ray information” and the synthesis of the video image can be regarded as “cutout of ray information from ray space”.
In case of ray space f(P, Q, θ, φ) projected to the 4-dimension, camera photographing position (Xc, Yc, Zc) and ray direction (θc, φc) are substituted to the above-mentioned two transformation equations so as to obtain (Pc, Qc) and information of ray photographed is stored in (Pc, Qc, θc, φc).
Ray information of video image photographed at a certain fixed point is stored in the form of sine curve on the Pθ surface as evident if X and Z are assumed to be constant in the first equation of the above-mentioned two transformation equations. Photographed image used for synthesis of the ray space of
To synthesize a video image from an arbitrary virtual viewpoint, ray information of an appropriate region is cut out from a constructed ray space. The region from which it is cut out is expressed as f(P0, Q0, θ0, φ0) by using the (P0, Q0) obtained by substituting the viewpoint position (X0, Y0, Z0) and ray direction (θ0, φ0).
In theory, video image from an arbitrary viewpoint point can be synthesized using the above-described methods. In reality, it is difficult to photograph information of all rays and in an actual photographing, sparse ray space as shown in the left diagram of
A result of interpolation of the Pθ section of the ray space of the left diagram of
After interpolation is carried out using the ray space representation method as described above, inverse transformation is carried out to a video image as seen from the virtual viewpoint to the local region. More specifically, transformation according to a following equation is carried out.
where Θ is an azimuth angle of a virtual viewpoint, θ is an azimuth angle of the central axis of a cylinder of a local region to which attention is paid, R1 is a distance from a home position of an object zone to a virtual viewpoint while ri=r/Ri. Moreover,
is digital video image coordinate of a point on interpolated video image (home position is the center of video image).
is digital video image coordinate of a point on virtual viewpoint video image (home position is the center of video image).
By multiplying the ratio of a distance (height of virtual viewpoint in case where a horizontal plane where a camera is disposed is regarded as a reference) from a plane where a camera is disposed to a virtual viewpoint to the distance in the direction of sight line from the center of an object to which attention is paid (central axis of a cylinder of the local region to which attention is paid) to the virtual viewpoint, by an amount of pixel transformation of the focal length of the camera, a travel distance of the coordinate of the center of the virtual viewpoint video image of the local region to which attention is paid (its travel direction is reverse) is obtained and then, the position in the vertical direction of the coordinate is displayed by moving it by only the above-mentioned travel distance.
That is, the travel distance of the virtual viewpoint video image of each local region is obtained based on the distance in the direction of sight line from the center of the object to which attention is paid (central axis of the cylinder of the local region to which attention is paid) to the virtual viewpoint and the distance from the horizontal plane (horizontal surface) where the camera is disposed up to the virtual viewpoint (height of the virtual viewpoint in case where the horizontal plane where the camera is disposed is regarded as a reference). This travel distance can be obtained from a following equation.
Where Δvi is a travel distance of the virtual viewpoint video image of a local region to which attention is paid, (f) is an amount of pixel transformation of the focal length of a camera, yi=Yi/Ri where Ri is a distance from home position of object zone to virtual viewpoint, Yi is a distance from a horizontal plane where a camera is disposed to the virtual viewpoint, ri=r/Ri where (r) is a distance from home position of object zone to the central axis of a cylinder of a local region to which attention is paid, where Θi is an azimuth angle of virtual viewpoint and θ is an azimuth angle of the central axis of the cylinder of the local region to which attention is paid.
From the above description, as regards the virtual viewpoint video image of each local region, it is made evident that this travel distance Δvi is inversely proportional to a distance in the direction of sight line from the center of the object to which attention is paid up to the virtual viewpoint and proportional to a distance from a horizontal plane where a camera is disposed to the virtual viewpoint (travel distance in a vertical direction of the camera). However, the direction of travel is inverse.
Next, if the coordinate of the center of the virtual viewpoint video image of the local region to which attention is paid within the free viewpoint video image to be synthesized finally is assumed to be (ui, vi), the free viewpoint video image is synthesized by overwriting from a local region deeper from the virtual viewpoint to a forward local region while that v coordinate is moved by Δvi to obtain (ui, vi+Δvi). In the meantime, by omitting drawing of the local region which is preliminarily expected to be overwritten later, from the beginning, the processing can be accelerated. By including the equation 15 in the equation 13, a following equation can be obtained.
Next, the second embodiment of the present invention will be described.
Then, the processing of the free viewpoint video image generating portion 17 of the present embodiment will be described below. If the position of the virtual viewpoint is located at a position apart from the plane where the camera is disposed, the ratio of a distance (height of the virtual point in case where the horizontal plane where the camera is disposed is regarded as reference) from the plane where the camera is disposed to the virtual viewpoint to the distance in the direction of sight line from the center of the object to which attention is paid (central axis of the cylinder of the local region to which attention is paid) up to the virtual viewpoint is multiplied by a ratio of transformation from the real zone to the ray space. As a result, a travel distance on the reading position the ray space of the local region to which attention is paid is obtained and a position for reading the ray information from the ray space of the local region is read in by moving it by the above-mentioned travel distance.
That is, for the virtual viewpoint video image of each local region, its reading position is obtained based on the distance from the center of the object to which attention is paid (central axis of the cylinder of the local region to which attention is paid) to the virtual viewpoint and the distance from the plane (horizontal plane) where the camera is disposed to the virtual viewpoint, i.e. height of the virtual point in case where the horizontal plane where the camera is disposed is regarded as reference. This reading position can be obtained according to a following equation.
where Q′ is a reading position for the ray space of the local region to which attention is paid if the virtual viewpoint is not located on the plane where the camera is disposed, Q is a reading position for the ray space of the local region to which attention is paid if the virtual viewpoint is located on the plane where the camera is disposed, Yi is a distance from the plane where the camera is disposed to the virtual viewpoint, Θ1 is an azimuth angle of the virtual viewpoint, θ is an azimuth angle of the central axis of the cylinder of the local region to which attention is paid, and ri=r/Ri where (r) is a distance from the home position of the object zone to the central axis of the cylinder of the local region to which attention is paid and Ri is a distance form the home position of the object zone to the virtual viewpoint.
From the above description, for the virtual viewpoint video image of each local region, it is evident that its reading position is inversely proportional to a distance in the direction of sight line from the center of an object to which attention is paid to the virtual viewpoint and that it needs to be moved by an amount proportional to the distance from the plane where the camera is disposed up to the virtual viewpoint (travel distance in the vertical direction of the camera).
After interpolation is carried out according to the ray space representation method as described above, inverse transformation is executed on a video image when the local region is viewed from the virtual viewpoint (steps S18 and 19 of
where Θi is an azimuth angle of the virtual viewpoint, θ is an azimuth angle of the central axis of the cylinder of the local region to which attention is paid, Ri is a distance from the home position of the object zone up to the virtual viewpoint while ri=r/Ri.
The processing of the second embodiment other than described above is the same as the invention mentioned previously (Japanese Patent Application No. 2006-185648) and thus description thereof is omitted.
As described above, according to the present invention, the virtual viewpoint video image viewed from the virtual viewpoint not located on the plane where the camera is disposed can be obtained. Consequently, any video images from the virtual viewpoint moved in any direction in terms of three-dimension can be generated.
Number | Date | Country | Kind |
---|---|---|---|
2007-199053 | Jul 2007 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5742331 | Uomori et al. | Apr 1998 | A |
5990935 | Rohlfing | Nov 1999 | A |
6118414 | Kintz | Sep 2000 | A |
6757422 | Suzuki et al. | Jun 2004 | B1 |
6774898 | Katayama et al. | Aug 2004 | B1 |
6809887 | Gao et al. | Oct 2004 | B1 |
7260274 | Sawhney et al. | Aug 2007 | B2 |
7420750 | Kuthirummal et al. | Sep 2008 | B2 |
7889196 | Nomura et al. | Feb 2011 | B2 |
7948514 | Sato et al. | May 2011 | B2 |
20010043737 | Rogina et al. | Nov 2001 | A1 |
20020110275 | Rogina et al. | Aug 2002 | A1 |
20030011597 | Oizumi | Jan 2003 | A1 |
20030048354 | Takemoto et al. | Mar 2003 | A1 |
20030071891 | Geng | Apr 2003 | A1 |
20050041737 | Matsumura et al. | Feb 2005 | A1 |
20060192776 | Nomura et al. | Aug 2006 | A1 |
20060215018 | Fukushima et al. | Sep 2006 | A1 |
20070109300 | Li | May 2007 | A1 |
20070126863 | Prechtl et al. | Jun 2007 | A1 |
20090315978 | Wurmlin et al. | Dec 2009 | A1 |
20100079577 | Matsumura et al. | Apr 2010 | A1 |
20110157319 | Mashitani et al. | Jun 2011 | A1 |
Number | Date | Country |
---|---|---|
10-111951 | Apr 1998 | JP |
2004-258775 | Sep 2004 | JP |
2008-15756 | Jan 2008 | JP |
Number | Date | Country | |
---|---|---|---|
20090033740 A1 | Feb 2009 | US |