1. Field of the Invention
This invention relates to the recording and display of three-dimensional (3D) images, notably in real-time video capture. This invention focuses on the ability to alter vergence instantaneously through software in order to amend disparities caused by differences between focal length and vergence distance. The ability to vary vergence allows selected images to be made to appear behind, level with, or in front of a display screen. While the invention is useful for consumer displays in television and films, the invention may also be used, for example, for visualizing weather, for mapping, for seismic surveys or for aircraft flight control. It is applicable, in real time, throughout the electro-magnetic spectrum particularly in the visible, infra-red and microwave portions. It may also be used for sonar.
2. Description of the Related Art
Vergence is usually thought of as the distance from a pair or more of cameras to a viewed object on which their optic axes are made to coincide. In related art these cameras, like eyeballs, are linked together for the purpose of creating 3D images. In this analogy, most 3D cameras adjust vergence mechanically by rotating their axes from zero degrees (for objects at infinity) to a less acute angle when objects of interest are nearby.
In the related art there are many mechanical means by which the necessary vergence may be achieved. In one example, a system of prisms and beam splitters may be used to combine the light from two or more vergent streams onto a single camera (or lens), with light being chopped alternately to differentiate inputs from various orientations, such as left from right. The vergence angle here can only be changed mechanically by adjusting certain optical components
In another example, the camera itself may be made to commutate between two or more adjacent positions, with the vergence angle being adjusted continuously to compensate for different viewpoints. However, this arrangement makes it hard to achieve the frame rate required for video.
Because of limitations like the foregoing what is sought here is a non-mechanical means of adjusting vergence which is quick, convenient, and (as an added benefit) can be applied anywhere in the data stream. Especially we seek a solution for allowing the viewer to adjust the vergence level in the display to the viewer's comfort level—in other words, to push back obtrusive content, or to pull objects of interest into prominence, if and when deemed desirable.
For a number of reasons, including comfort and emphasis, there is a need to control vergence in 3D photographs and videos. The present invention shows a means of controlling vergence both at the image capture end and at the viewing end of the image stream, and at any point in between. The issue of comfort has come up with viewers, either singly or in theater audiences, and this invention shows a means to favorably alter what a viewing audience sees, either on a personal screen or at a movie theater. This can be achieved in our present invention in real time, with no moving parts, in software.
There are major causes of audience viewing headaches. One is the disparity between the eyes' focal point, which naturally falls on the screen, and the vergence point, which may lie too far behind or in front of the screen. The vergence point is a psychological barrier since our eyes are designed by nature to focus and converge at the same point.
Another cause of viewer headaches may be the excessive horizontal distance between left and right projected images. While these may appear normal on small displays—say 0.5″ apart on a home television screen—when images are projected onto a theater screen (which may be many times larger) they could easily exceed the distance between a normal pair of eyes (2.5″), forcing the viewer to try to see images wall-eyed. This too can be amended through software.
Another cause of viewer headaches may be vertical disparity. This has to do with camera pose or pitch. For multiple cameras their optic axes should all converge in the same plane. If one or more cameras is out of plane the effect is multiplied from small to large screens.
The situation becomes more complex when the number of cameras increases to (say) eight or nine, such as are required for filming 3D for display on glasses-free screens. (Some schemes require a hundred or more cameras.)
Our practice is enabled through the ability to reassign (or warp) apparent pixel positions on the detectors to other than their previous (nominal) positions, so that the image positions on the detectors, and the projected image position on the screen, are apparently sufficiently shifted (i) to move the vergence points closer to or further away from the cameras and (ii) to amend horizontal and vertical disparities.
The ability to control vergence and disparities means that viewing comfort can be increased to an acceptable level no matter (within limits) how the videos have been taken. As will be shown, this can be done in software and in virtually real-time during recording, transport and display.
This invention (with further advantages noted below) may be best understood by reference to the following drawings, in which like reference numerals identify like elements, and in which: the to the following descriptions taken together with the accompanying several figures illustrate the substance of the invention.
In
In the general case of two cameras or more, the cameras and the vergence points 11, 12 and 13 will, in general but not always, be coplanar.
In
By moving the vergence point digitally the camera system has no mechanical means of failure because there are no moving parts. Additional benefits are that through software vergence can be controlled continuously and instantly.
The illustration in
For reasons we will discuss, unless the focal plane is the same as the vergence plane the resulting video could give a viewer great discomfort. As a rule of thumb this can occur when the disparity is more than plus or minus half a diopter. So the issue we address is—how can take away this disparity, how can we increase the viewer's comfort?
Referring now to
In the present invention we approach this through the following steps, which have to do with speed and efficacy, reducing to a minimum computing time, allowing vergence changes within milliseconds:
The first step is primary alignment. By bringing the images from the two cameras manually into close correspondence we can reduce computing power later. We note that the cameras have seven degrees of freedom—focal length and zoom, x, y and z, and pitch, roll and yaw. Focal length and yaw (or vergence) in this instance are the most important to us. The alignment of x, y and z manually may be good enough to bring the two detectors onto almost identical points (0,0) on their local x-y planes, differing only by their global offsets a and −a on the z-axis in
We note that in all the transforms below we use the local coordinates of the detectors (rather than the global coordinates discussed later for image mapping) for all transformations. That is, that when our alignments are carried out to a sufficient degree point (xi, yi), of detector 31 will correspond (almost) exactly to point (xi, yi) of detector 32.
The adjustments then apply to the focal length and zoom, pitch, roll and yaw. It may be convenient here to place the two images of a scene side by side on a display containing cross-hairs or fiducial marks. The roll can be adjusted manually by aligning a left and a right vertical image (say of a tree) against corresponding vertical lines on the screen. The vertical position (or pitch) can now be manually set against some common horizontal projection—such as a branch on the tree. These two actions will also define the yaw, or vergence, of the two cameras upon the tree.
The final (manual) adjustment applies to focal length and zoom. By adjusting two (or more) corresponding (widely separated) features the two images near or on the primary plane 25 may be made to appear the same size to the eye.
With proper alignment tools this process can (typically) be done in a few minutes
We should now have the images lined up (by eye) to within a few pixels. For the next step, secondary alignment, we may use a typical feature-based approach. For feature selection, any of a number of edge detection algorithms can be used, depending on our purpose (e.g. J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-8, No. 6, 1986, pp. 679-698). However, since we already know (by eye) which feature(s) we have chosen—such as the golfer 22—this process can be accomplished quite quickly.
The simplest way to line up the corresponding images such as those of the golfer 35 and 36 is with a Euclidean similarity transform, simply put:
X′=TX
Where X is a group of pixels representing the first image, X′ is a similar group in the second image (using local coordinates). T is then the transformation matrix so that
where (in this case because of previous efforts in alignment) we assume that there is only a small difference in size s=1+ε, a small angle α between the two images, and only small horizontal and vertical image displacements tx and ty.
If we want to do a really fine alignment between the two images (realizing that because the images are taken from different perspectives the error function will only approach zero) we may use the following. (See Richard Szeliski, December 2006). We may consider the minimum of the sum of squares function ESSD(u) for individual (noted) features on detectors 31 and 32:
Where u=(u, v) is the feature displacement and ei=I1(xi+u)−I0(xi) is the error function or feature displacement offset within the detecting areas (I0 being the reference feature image and I1 the subsequent sample).
We note also that the sum of squared differences function ESSD(u) above can be written as a Fourier transform:
In this way ESSD(u) can be computed by subtracting twice the correlation function from the sum of the energies of the two images. We may use this approach for correlating larger pixel areas because it is faster.
A corollary of this is that having lined up (say) two features (relatively far apart) on the two detectors we can now put the small angle α and the small difference in distance s between the two features into the transformation T above to bring the registration between the images to one pixel or less.
Having determined that the images are well-registered allows us to perform the following on-axis transform on both images simultaneously—in a general case as a homography or perspective transform. We know for on-axis transforms that we have to move the apparent images the same amount in opposite directions on their local x-axes in order to change vergence. (This, as may be seen on
This is also illustrated in
In the simplest case, with well-correlated two-dimensional images on the detectors 31 and 32, and a small change in vergence, we can do a simple pixel shift along their local x-axes. Using homogeneous coordinates, with the small change ±δ, the transformation matrices will be
T1 will be applied to create the apparent pixel shift 35 to 33 (on detector 31) and T2 will be applied to create the (opposite) apparent pixel shift 34 to 36 (on detector 32). In this particular illustration the vergence plane will move forward and the image will recede as we increase δ.
For larger shifts a projective (or perspective) transformation may be needed, to make foreground images larger and background smaller, or vice-versa. Other possible transformations include similarity, affine, Euclidean and non-Euclidean (for amending barrel and pin-cushion distortions). (For off-axis situations we will make a case below for skewed (or unequal) transformations.)
A transformation of particular interest is the inverse transformation
X=T−1X′
which can undo previous transformations. In the case above we can apply this to T1 and T2 since they are both square and of full rank. As we have noted this can be applied at the display end also, where a viewer will have control to alter vergence to the viewer's own comfort.
However, because we're dealing with real numbers x1, y1, etc. we do not need complex matrix multiplications (which may include determinants approaching zero). Instead we can simply invert individually the parameters of T1 to give T1−1, for example: if
then
We may call this special case parametric inversion (rather than matrix inversion).
This leads to a relative simplicity of coding. If the parameters s, α, tx, and ty can be changed relatively easily at the source, then they can be changed relatively easily at the destination—and in fact at any point in between. For coding purposes this can be done unequally to the left and right images and quasi-randomly in timing. Knowing the timing and the code the recipient simply has to decode it, however with the necessity of perfect synchronicity to avoid jitter.
Referring now to
The golfer 22 in
with a perspective projection P0
From this equation if we know the value of d0 we can map it back to the coordinate p since
p˜P0−1E0−1
where P0−1 and E0−1 are the inverse transforms of P0 and E0, and then project it back to frame 32 with
Which gives us point 36, X1=(x1, y1, z1, d1), in frame 32. If, in our case, we have a small angle α and if the feature, such as the golfer 22, is compact (i.e. not too deep) at the scene, we can simplify by considering the scene to be flat as it is imaged onto the frames 31 and 32. Therefore the last row of matrix P0 can be replaced with an equation that maps points on the plane d0=0, which reduces the last equation to
That is, we can reduce the equation from a 4×4 homography matrix in 3D to a 3×3 projective matrix in 2D. In addition since the scene is not too close and the translation is small this can be reduced still further to a 2×3 affine matrix A10 to substitute for H10 in the equation above.
Similarly we can map the tree 21 in
Knowing the relative locations and orientations of the sets of images 33 to 35 and 34 to 36 a homography (or perspective) transform can be performed to bring the golfer 22 to the position of the tree 21 with the entire image skewed to follow.
To illustrate a simple calculation the following is an instance of a vergence program in Matlab, for a simple on-axis Euclidean transform, for a unique vergence point. In this case the parameters s, tx, ty and α (alpha) are fairly small:
vergence.m
% load input images
I1=double(imread(‘left.jpg’));
[h1 w1 d1]=size(I1);
I2=double(imread(‘right.jpg’));
[h2 w2 d2]=size(I2);
% show input images and prompt for correspondences
figure; subplot(1,2,1); image(I1/255); axis image; hold on;
title(‘first input image’);
[X1 Y1]=ginput2(2); % get two points from the user
subplot(1,2,2); image(I2/255); axis image; hold on;
title(‘second input image’);
[X2 Y2]=ginput2(2); % get two points from the user
% estimate parameter vector (t)
Z=[X2′ Y2′; Y2′−X2′; 1 1 0 0 ; 0 0 1 1]′;
xp=[X1; Y1];
t=Z \ xp; % solve the linear system
a=t(1); %=s cos(alpha)
b=t(2); %=s sin(alpha)
tx=t(3);
ty=t(4);
% construct transformation matrix (T)
T=[a b tx; −b a ty; 0 0 1];
% warp incoming corners to determine the size of the output image (in to out)
cp=T*[1 1 w2 w2; 1 h2 1 h2; 1 1 1 1];
Xpr=min([cp(1,:) 0]):max([cp(1,:) w1]); % min x:max x
Ypr=min([cp(2,:) 0]):max([cp(2,:) h1]); % min y:max y
[Xp,Yp]=ndgrid(Xpr,Ypr);
[wp hp]=size(Yp); %=size(Yp)
% do backwards transform (from out to in)
X=T \ [Xp(:) Yp(:) ones(wp*hp,1)]′; % warp
% re-sample pixel values with bilinear interpolation
clear Ip;
xI=reshape(X(1,:),wp,hp)′;
yI=reshape(X(2,:),wp,hp)′;
Ip(:,: 1)=interp2(I2(:,: 1), xI, yI, ‘*bilinear’); % red
Ip(:,: 2)=interp2(I2(:,: 2), xI, yI, ‘*bilinear’); % green
Ip(:,: 3)=interp2(I2(:,: 3), xI, yI, ‘*bilinear’); % blue
% offset and copy original image into the warped image
offset=−round([min([cp(1,:) 0]) min([cp(2,:) 0])]);
Ip(1+offset(2):h1+offset(2),I+offset(1):w1+offset(1),:)=double(I1(1:h1,1:w1,:));
% show the result
figure; image(Ip/255); axis image;
title(‘vergence image’);
We can introduce into this program a function to change the angle β (beta) from
As an added benefit it can be seen above that the program also allows compensation for chromatic aberration as the angle β changes.
The angle β deserves further comment in that it may be more convenient in our fixed solid solid-state systems to physically align the optics to some intermediate angle so that the pixel changes are not too far off-axis. We can do this by picking a point 12 in which the angle β is intermediate between 0° (at infinity 13) and 2β (close up 11). This will reduce computer processing software and time by minimizing distortions.
We refer now to
The fix referred to will be ty1=f tan γ added to ty in the transformation, where f is the back focal length and γ is the pitch angle 61 in
This fix, which may be achieved automatically in an alignment algorithm as above, is very useful if the cameras are dropped, warped by heat, or dislocated in some other way. It should not need to be altered otherwise.
(Note: We are following MPEG-4, which is a collection of methods defining compression of audio and visual (AV) digital data introduced in 1998. It was designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496. Uses of MPEG-4 include compression of AV data for web (streaming media) and CD distribution voice (telephone, videophone) and broadcast television applications). We could easily use any other protocol suitable for transferring high-speed data over airwaves or land-lines
While the invention has been described and illustrated (in general) as a dual imaging device in which the vergence distance to objects of interest is mutable in software, and in which, the vergence distance to screen images is mutable by the viewer, in fact to those skilled in the art, the techniques of this invention can be understood and used as tools for creating and perfecting three-dimensional imaging tools with variable vergence throughout the electro-magnetic spectrum and beyond. It may be understood that although specific terms are employed, they are used in a generic and descriptive sense and must not be construed as limiting. The scope of the invention is set out in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5606627 | Kuo | Feb 1997 | A |
5839440 | Liou et al. | Nov 1998 | A |
6621921 | Matsugu et al. | Sep 2003 | B1 |
6798406 | Jones et al. | Sep 2004 | B1 |
6798701 | Yamazaki | Sep 2004 | B2 |
6862364 | Berestov | Mar 2005 | B1 |
6917701 | Martins | Jul 2005 | B2 |
7027659 | Thomas | Apr 2006 | B1 |
7092003 | Siegel et al. | Aug 2006 | B1 |
7170547 | Uchiyama et al. | Jan 2007 | B2 |
7440004 | Siegel et al. | Oct 2008 | B2 |
8284235 | Held et al. | Oct 2012 | B2 |
20030013523 | Tashiro et al. | Jan 2003 | A1 |
20030018647 | Bialkowski | Jan 2003 | A1 |
20030107655 | Ishizaka et al. | Jun 2003 | A1 |
20050060620 | Hogan | Mar 2005 | A1 |
20060072020 | McCutchen | Apr 2006 | A1 |
20060251330 | Toth et al. | Nov 2006 | A1 |
20090058987 | Thielman et al. | Mar 2009 | A1 |
20100007719 | Frey et al. | Jan 2010 | A1 |
20100014780 | Kalayeh | Jan 2010 | A1 |
20100296752 | Pataky et al. | Nov 2010 | A1 |
20110012998 | Pan | Jan 2011 | A1 |
20110075020 | Veeraraghavan et al. | Mar 2011 | A1 |
20110142311 | Felsberg | Jun 2011 | A1 |
20110143811 | Rodriguez | Jun 2011 | A1 |
20110222757 | Yeatman et al. | Sep 2011 | A1 |
20110248987 | Mitchell | Oct 2011 | A1 |
20120026329 | Vorobiev | Feb 2012 | A1 |
20120119744 | Habashy et al. | May 2012 | A1 |
20120182397 | Heinzle et al. | Jul 2012 | A1 |
20120249746 | Cornog et al. | Oct 2012 | A1 |
20120249750 | Izzat et al. | Oct 2012 | A1 |
20120250152 | Larson et al. | Oct 2012 | A1 |
20130093855 | Kang et al. | Apr 2013 | A1 |
20130113891 | Mayhew et al. | May 2013 | A1 |
Entry |
---|
Chandrakanth, et. al., “Novel Architecture for Highly Hardware Efficient Implementation of Real Time Matrix Inversion Using Gauss Jordan Technique”, ISVLSI, 2010. |
Zilly, et. al., “STAN—An Assistance System for 3D Productions,” 2011 14th ITG Conference on Electronic Media Technology (CEMT), 2011, pp. 294-298. |
Kwon, “Vergence Control of Binocular Stereoscopic Camera Using Disparity Information”, J Opt Soc of Korea, vol. 13 No. 3, Sep. 2009 pp. 379-385. |
Canny, “A computational approach to edge detection”, IEEE Trans Pat Anal and Mach Intel, PAMI-8 No. 6, Nov. 1986. |