The present invention is concerned with the field of imaging systems which may be used to collect and display data for production of 3D images. The present invention may also be used to generate data for 2D and 3D animation of complex objects.
The field of 3D image production has largely been hampered by the time which it takes to take the data to produce a 3D film. Previously, 3D films have generally been perceived as a novelty as opposed to a serious recording format. Now, 3D image generation is seen as being an important tool in the production of CG images.
One established method of producing 3D image data has been photometric stereo (see for example R Woodham “photometric method for determining surface orientation from multiple images” Optical Eng. Number 1, pages 139-144 1980) where photographs are taken of an object from different illumination directions. A single photograph is taken for each illumination direction. Thus, this is not a technique which can be used for capturing video of a moving object in real time.
The present invention addresses the above problem and in a first aspect provides an imaging system for imaging a moving three dimensional object, the system comprising:
A, Petrov “Light Color and Shape” Cognitive Processes and their Simulation, pages 350-358, 1987 discuss the use of colour for computing surface normals.
However, there has been no realisation that colour could be used to address the issue of recording 3D video in real time.
Further, the technique can be applied to recording data for complex objects such as cloth, clothing, knitted or woven objects, sheets etc.
When recording data from a moving object self shadowing will occur and this will affect data. Therefore, preferably, said processor is configured to determine the position of shadows arising as said object moves. The position of shadows is determined by locating sharp changes in the intensity of the signal measured from each of said light sources.
In a preferred embodiment, the processor is configured to determine the position of shadows before determining the position of surface normals for said object.
In a preferred embodiment, the apparatus further comprises a memory configured to store calibration data, said calibration data comprising data from a sample with a same surface characteristic as the object stored with information indicating the orientation of the surface of the sample. The processor may then be configured to determine the depth map for the object from the collected radiation using the calibration data.
The above may be achieved by using a calibration board and a mounting unit configured to mount said calibration board, said calibration board having a part of its surface with the same surface characteristics as the object and said mounting unit to mount comprising a determining unit configured to determine the orientation of the surface of the calibration board.
Although the data gathering apparatus can stand alone, it may be incorporated in part of a 3D image generation apparatus further comprising a displaying unit configured to display a three dimensional moving image from said depth map.
The system may also be used in 2D or 3D animation where the system comprises a moving unit configured to move said generated depth map.
The system may also further comprise an applying unit configured to apply pattern to the depth map, the applying unit configured to form a 3D template of the object from a frame of the depth map and determine the position of the pattern on said object of said frame and to deform said template with said pattern to match subsequent frames. The template may be deformed using a constraint that the deformations of the template must be compatible with the frame to frame optical flow of the original captured data. Preferably the template is deformed using the further constraint that the deformations be as rigid as the data will allow.
In a second aspect, the present invention provides a method for imaging a moving three dimensional object, the method comprising:
The method may be applied to animating cloth or other flexible materials.
The present invention will now be described with reference to the following non-limiting embodiments in which:
a is a frame from a video of a moving object wearing a jumper with texture being collected using a video camera and three different colour light sources and
In this particular example, first light source 3 is a source of red (R) light, second light source 5 is a source of green (G) light and third light source 7 is a source of blue (B) light. However other frequencies may be used. It is also possible to use non-visible radiation such as UV or infrared.
In this embodiment, the system is either provided indoors or outside in the dark to minimise background radiation affecting the data. The three lights 3, 5 and 7 are arranged laterally around the object 1 and are vertically positioned at levels between floor level to the height of the object 1. The lights are directed towards the object 1.
The angular separation between the three light sources 3, 5 and 7 is approximately 30 degrees in the plane of rotation about the object 1. Greater angular separation can make orientation dependent colour changes more apparent. However, if the light sources are too far apart, concave shapes in the object 1 are more difficult to distinguish since shadows cast by such shapes will extend over larger portions of the object making data analysis more difficult. In a preferred arrangement each part of the object 1 is illuminated by all three light sources 3, 5 and 7.
Camera 9 which is positioned vertically below second light source 5 is used to record the object as it moves while being illuminated by the three lights 3, 5 and 7.
To calibrate the system, a calibration board of the type shown in
To determine the shape, it is first necessary to determine the orientation of the normals to the surface for all points on the surface of the object to be imaged. This embodiment assumes that the three lights sources 3, 5 and 7 induce a colour cue on every surface point which is dependent on the orientation of that surface point.
Thus, there is a one-to-one mapping M between the surface colour I and the orientation n:
I=M(n) or n=M−1(I)
To determine M, photometric-stereo techniques assume that the surface is a Lambertian surface and that the camera sensor response is linear.
Where I is the RGB colour observed on the image, b is a constant vector that accounts for ambient light, n is the unit normal at the surface location and L is a 3×3 matrix where every column represents a 3D vector directed towards the light source and scaled by the light source intensity times the object albedo. The object albedo is the ratio of the reflected to incident light.
To simplify this example, it is assumed that the ratios of the colors are constant i.e. the ration between R/B and B/G should be the same for each pixel in the image. This will allow the mapping between colours and surface orientation to be determined by estimating the 3×4 matrix [LTb] up to a scale factor.
For many practical situations, it will be more difficult to calculate the mapping since the camera response is non-linear and the surface will not be a Lambertian reflector. However, it is possible to use a calibration tool of the type shown in
The results from the initial calibration routine where an image is captured for various known orientations of the board does not need to be performed for every possible board orientation as nearest neighbour interpolation can be used to determine suitable data for all orientations. It is possible to capture data from just 4 orientations in order to provide calibration data for a 3×4 matrix. Good calibration data is achieved from around 50 orientations. However, since calibration data is easily collected it is possibly to obtain data from thousands of orientations.
Although the technique of using the calibration board can be used to determine complex mappings for non-Lambertian reflectors and cameras with non-linear response functions, it is still necessary to assume that the object albedo has constant chromaticity. If this is not assumed, the mapping M is non invertible and there will be several valid surface orientations for the same surface colour.
The object may also shadow itself during filming.
In the absence of a shadow, the reflected illumination from one channel, i.e. either red, green or blue would be expected to vary smoothly. A sharp variation indicates the presence of an edge, these edges are determined for each channel by using a Laplace filter. The results from this analysis which is carried out per channel is shown in
The pixels which are determined to be edge pixels are then further analysed to determine gradient orientation. The pixels are analysed along each of the either cardinal directions (i.e. north, south, east, west, north-west, south-west, north-east, south-east). Pixels whose gradient magnitude falls below a threshold τ are rejected. Adjoining pixels whose gradient directions agree are grouped into connected components.
The algorithm could also be used to determine the difference between boundary edges of the object and shadows. This is shown in
From the above a look up shadow mask can be determined.
The surface may then be reconstructed by first determining the position of the shadows using the above technique and then estimating the normal for all pixels where there is a good signal from all three lights, i.e. there is no shadow. The normal is estimated as described above.
If the signal from only two lights can be used, then the data can still be processed but constant albedo must be presumed, i.e. constant chromaticity and constant luminance.
Once the 2D grid of surface normals is produced, each frame of normals is integrated using a 2D Poisson solver or the like for example, a Successive OverRelaxation solver (SOR) is used to produce a video of depth maps or surface mesh for each frame.
The generation of the surface mesh for each frame is subject to the boundary conditions of the shadow mark which is used as the boundary conditions for the Poisson solver. Frame to frame coherency of silhouettes is also taken as a boundary condition.
To verify the accuracy of the technique a MacBeth colour chart was used. The chart was illuminated with each of the coloured lights in turn.
It was found that the technique compensated for impurities in the colours of the lights e.g. the red light produced small amounts of blue and green light in addition to the red light. Also, the technique compensated for colour balance functions that are often used in modern video cameras.
a shows the dancer of
This can be compared with the method of the present invention as shown in
Although the images of
However, clothing will often have a pattern which is provided by colour on the surface wither in addition to or instead of texture.
In the results shown in
Frame (i)—Frame no. 0
Frame (ii)—Frame no. 250
Frame (iii)—Frame no. 340
Frame (iv)—Frame no. 380
Frame (v)—Frame no. 427
Frame (vi)—Frame no. 463
Frame (vii)—Frame no. 508
In the first method of superimposing a colour pattern onto the dancer, the colour image which is the words ICCV 07 and green and yellow flag are generated using the depth map data as described above. This can be seen to work well for frames (i) to (iii), however, in frame (iv) both the flag and the pattern are staying on the same vertical level even though the dancer is moving down. In frame (iv), the flag is seem to deform well with the dancer's jumper. However, the pattern is staying on the same vertical lever even through the dancer is moving down. Thus the pattern appears to be moving upwards relative to the dancer's jumper. This problem continues in frames (v) to (vii).
This is done by letting zk(u,v) be the depth map at frame t. A deformable template is set which corresponds to the depth map at frame 0, the template is a triangular mesh with vertices:
x
0
i=(ui0,vi0,z0(ui0,vi0)) i=1 . . . N
and a set of edges ε.
At frame t, the mesh is deformed to fit the tth depth map by applying a translation Tit to each vertex xi so the ith vertex at frame t moves to xi0+Tit
The images generated in
Frame-to-frame optical flow is first computer using a video of normal maps. A standard optical flow algorithm is then used (see for example M Black and P Anadan “The robust estimation of multiple motions: parametric and piecewise smooth flow fields” Computer Vision and Image Understanding, volume 63(1), pages 75 to 104, January 1996) for which every pixel location (u,v) in frame t predicts the displacement dt(u,v) of that pixel in frame t+1. Let (ut,vt) denote the position in frame t of a pixel which in frame 0 was at (u0, v0). (ut,vt) can be estimated by advecting dt(u,v) using:
(uj,vj)=(uj-1,vj-1)+dj-1(uj-1,vj-1) where j=1 . . . t
If there was no error in the optical flow and the template from frame zero was deformed to match frame t, then vertex xi0 in frame t is displaced to point:
y
i
t=(uit,vit,zt(uit,vit))
This constraint can be formulated as an energy term comprising the sum of squared differences between the displaced vertex locations xi0+Ttt and the positions predicted by the advected optical flow yit at frame t:
The results of the above process are seen in
To address this issue a further constraint is added to bring rigidity into the picture. To regularise the deformation of the template mesh, translations applied to nearby vertices need to be kept as similar as possible. This is achieved by adding energy term ER:
The above two terms are then combined:
E
TOT(T1t, . . . , TNt)=αED+(1−α)ER
which is optimised with respect to T1t, . . . , TNt for every frame t. For optimisation an iterated scheme is used where Tit with the optimal translation {circumflex over (T)}it given that every other translation is constant. This leads to:
Where N(i) is the set of neighbours of vertex i and α is a parameter indicating the degree of rigidity of the mesh. The results of this calculation are shown in
The data described with reference to
Skinning algorithms are well known in the art of computer animation. To generate the character of
The matrix Sit represents the transformation from the joint's local space to world space at time instant t.
The mesh was attached to the skeleton by first aligning a depth pattern of the fixed dress with a fixed skeleton and for each mesh vertex a set of nearest neighbours on the skeleton. The weights are set inversely proportional to these distances. The skeleton is then animated using publicly available mocap data (Carnegie-mellon mocap database http://nocap.cs.cmu.edu). The mesh is animated by playing back one of the captured cloth sequences.
Number | Date | Country | Kind |
---|---|---|---|
0718316.3 | Sep 2007 | GB | national |