The present disclosure relates to computer vision and image processing, and more particularly to stabilizing fisheye video imagery.
The use of automated vision systems in traffic surveillance and monitoring applications has increased drastically over the last several years. Such applications include traffic flow monitoring, vehicle counting, and queue detection and intersection management. While there are cameras for surveillance or the like fixed on a stationary medium such as a fixed pole on the road for taking pictures or images of moving scenes, e.g., vehicles, pedestrians, and other motions, the cameras may be also mounted on a medium that could be moving, for example, a traffic light that is hung by a cable and that could swing, for instance, due to wind, other weather conditions, or any other condition that would cause the traffic light to move. If the cameras are installed on such moving mediums, the images taken from frame to frame need to be compensated for the motion of the camera. That is, the image recognition or computer vision techniques or the like need to be able to distinguish the moving objects in the scene from the object displacements occurring from frame to frame due to the moving camera. Therefore, it is desirable to have a technique for accounting for this motion.
A method and system for stabilizing fisheye video imagery are provided. The method, in one aspect, may include selecting a reference image from a series of fisheye video imagery and selecting a plurality of landmark points in the reference image. The method may also include tracking the plurality of landmark points in a new image from the series of fisheye video imagery. The method may further include computing a coordinate system mapping between the new image and the reference image using the tracked landmark points, and resampling the new image to align with the reference image.
The step of computing, in one aspect, may include transforming fisheye image coordinates of the reference image to a reference world space coordinates using an understanding of a camera lens; transforming the tracked landmark points in the new image to a new image world space coordinates using the understanding of the camera lens; and analyzing the plurality of landmark points in the reference world space coordinates and the tracked landmark points in the new image world space coordinates in pairs to define homography between a plane in reference world space and a plane in new image world space, the plurality of landmark points including four or more landmark points.
The step of computing, in one aspect, may further include transforming pixel coordinates in the reference world space to the new image world space using the homography; and transforming the transformed pixel coordinates to the fisheye image coordinates of the new image.
The step of resampling, in one aspect, may include copying data closest to the transformed pixel coordinates in the fisheye image coordinates of the new image to corresponding reference pixel coordinates in the fisheye image coordinates of the reference image.
A method of stabilizing fisheye video imagery, in another aspect, may include scanning a reference fisheye image to obtain a pixel coordinate location (u, v) and selecting a new fisheye image. The method may also include transforming the reference fisheye image coordinate location (u, v) to reference image's world space coordinate (x, y) and transforming the reference image's world space coordinate (x, y) to new image's world space coordinate ({circumflex over (x)}, ŷ). The method may further include transforming the new image's world space coordinates ({circumflex over (x)}, ŷ) to new image's fisheye space coordinate (û, {circumflex over (v)}) and mapping new image data found closest to the new image's fisheye space coordinate (û, {circumflex over (v)}) to the reference fisheye image coordinate location (u, v). Yet in another aspect, the steps may be performed for each pixel in the reference fisheye image and a new fisheye image.
The step of transforming the reference image's world space coordinate (x, y) to new image's world space coordinate ({circumflex over (x)}, ŷ) in one aspect, may be performed using homography computed from tracked points in the reference fisheye image and the new fisheye image.
A system for stabilizing fisheye video imagery, in one aspect, may include a processor and a module executable on the module. The module may be operable to scan a reference fisheye image to obtain a pixel coordinate location (u, v). The module may be further operable to select a new fisheye image. The module may be further operable to transform the reference fisheye image coordinate location (u, v) to reference image's world space coordinate (x, y) and further operable to transform the reference image's world space coordinate (x, y) to new image's world space coordinate ({circumflex over (x)}, ŷ). The module may be yet further operable to transform the new image's world space coordinates ({circumflex over (x)}, ŷ) to new image's fisheye space coordinate (û, {circumflex over (v)}) and map data found closest to the new image's fisheye space coordinate (û, {circumflex over (v)}) to the reference fisheye image coordinate location (u, v).
A system for stabilizing fisheye video imagery, in another aspect, may include a processor and a module operable to execute on the processor. The module may be further operable to select a reference image and select a plurality of landmark points in the reference image. The module may be also operable to track the plurality of landmark points to a new image and compute a coordinate system mapping between the new image and the reference image using the tracked landmark points. The module may be further operable to resample the new image to align with the reference image.
A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform one or more methods described herein may also be provided. Still yet, a computer readable storage medium storing a program of instructions executable by a machine to perform one or more methods described herein may be provided.
Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
A single, large-angle fisheye camera mounted underneath an otherwise standard traffic light takes pictures of the area.
At 104, a plurality of landmark points in the reference image are selected and labeled. Preferably, four or more landmark points are selected. The following description explains in one embodiment how salient image points on stationary structures in the reference image are tracked through the video using the Lucas-Kanade algorithm. Briefly, Lucas-Kanade method is optical flow estimation method. The system and method of the present disclosure in one embodiment make use of landmark points in the scene to understand and characterize the camera movement. For these landmarks, the system and method of the present disclosure may select naturally occurring background points on fixed structures with image edge information. Some examples of such landmark points might include the corners of road signs or intersecting lines on the road surface. Ideally, points that are not frequently affected by vehicle movement may be selected for videos of road with moving vehicles. The landmark points may be selected manually, and for example, may be input as one or more parameters to the algorithm of the present disclosure. In another embodiment, the landmark points many be selected using an automated method that detects and maintains landmark points. One example of such methodology is described by Jianbo Shi and Carlo Tomasi, Good features to track, IEEE International Conference on Computer Vision and Pattern Recognition, 1994.
At 106, the landmark points are then tracked from image-to-image, for instance, taken from the camera as it swings. In one embodiment, the tracking may be performed the Lucas-Kanade algorithm. Other algorithms may be used. In one embodiment of the present disclosure, landmark points are tracked forward through sequential frames using the Lucas-Kanade method, which is briefly described herein. The point tracking finds pixel displacements δx and δy that reduce the error in registered image intensities, E(·), as defined by:
E(x,y,t,t+δ)=[I(x+δx,y+δy,t+δt)−I(x,y,t)]2 (Eq. 1)
where I is a sequence of two-dimensional image frames indexed by spatial dimensions x and y and time dimension t. Optical flow techniques, which include the Lucas-Kanade method, model intensity changes by their total derivative. By taking the Taylor expansion around a point in the image sequence, and setting the higher order terms to zero, we obtain
By assuming that intensity is constant along its traveled path, the displaced intensities can be set equal to one another to obtain the optical flow equation
which are the displacements normalized to time.
The optical flow equation, however, cannot be solved for an individual point as there are two unknowns and only one equation; this is commonly known as the “aperture problem” of optical flow. The Lucas-Kanade algorithm solves the equation by examining a neighborhood around each desired displacement vector to thereby define multiple equations and resulting in a least-squares solution. In short, the Lucas-Kanade algorithm strives to match a region around each pixel of interest. However, since the method relies on local gradient information, it can become less reliable for large pixel displacements. This issue is usually addressed by solving the least square problem iteratively, warping I between iterations to successively increase accuracy in the solution. Additionally, the iterative search can be performed in a coarse-to-fine fashion, beginning with lower image resolutions.
The tracked points as described above are used to align each acquired fisheye image to the coordinate system of the reference fisheye image. That is, in one embodiment of the present disclosure, the tracked points are used to compute the transformation between the current image and the reference image. This transformation is then used to compute a representation of the current fisheye image in the coordinate system of the reference fisheye image. Thus, at 108, using a simplifying planar world assumption, the tracks of these landmark points are used to compute a coordinate system mapping between each image and the reference image directly in the fisheye image space.
At 110, this coordinate system mapping between each image and the reference image is used to warp (resample) each image so that it is aligned with the reference image. The method and system of the present disclosure work directly in the fisheye image space. Working directly in the fisheye image space removes the computational complexity of fisheye-to-perspective transformation and also ensures that the image pixels are used at their native resolution.
The following notations used in the description described here. Let pIq(u, v) represent image q in the coordinate system of image p, where (u, v) represents a pixel location in the digital image. The image stabilization task is to compute rIn(u, v), which is the image at time n in the coordinate system of the reference image indicated by r. The observed image at time n is represented by nIn(u, v), which is image n in coordinate system n.
In one embodiment of the present disclosure, the swinging camera or the like is approximated by a pure rotation. Under pure rotation, scene points in different (perspective) images are related by a plane projective transformation, also known as a homography. In general, the three dimensional (3D) motion of the camera imparts a non-trivial perspective distortion between each image frame and the reference. Under pure camera rotation, however, all of the scene points, even though they are 3D, can be considered to lie on a plane, which in this disclosure is referred to as the world plane. The mapping between two such views can be exactly represented by a plane projective transformation or homography. The homography matrix may be computed between each frame and the reference frame. This transform can be computed from four or more point correspondences. This transformation may be used along with the known camera model, to compute a representation of each image in the coordinate system of the reference image. As illustrated in
Referring to
Point (x, y) is then transformed in homogeneous coordinates by the 3×3 homography matrix Hn (for image frame n) to the point (λ{circumflex over (x)}, λŷ, λ), which corresponds to ({circumflex over (x)}, ŷ) on the reference world plane, Applying the camera model again, we can then project the transformed point ({circumflex over (x)}, ŷ) back to the image plane to find (û, {circumflex over (v)}) in coordinate system n. To compute image n in the reference coordinate system, we use
rIn(u,v)=nIn(û,{circumflex over (v)}) (Eq. 4)
for all (u, v). With a known camera model, the only unknown in this process is the homography matrix Hn. This matrix is given by the mapping that takes the landmark points in the world plane coordinates of the reference frame, (xl, yl), to their corresponding world plane coordinates.
In one aspect, the steps 102 to 108 may be considered estimation steps in which parameters needed in stabilizing the images are computed. Step 110 may be considered as “warping” stage in which a new image registers with the reference image in the coordinate system of the reference image to stabilize the images.
In the following description, the notion of a “world space” is used. This is a flattening of what is seen in the fisheye image. The world space refers to the coordinate system of a perfect plane that is perpendicular the camera's optical axis. For instance, as shown in
As described above, the estimations steps may include defining landmark points in the reference image, for instance, as described above with reference to step 104 in
Warping registers or aligns the new image with the reference image. Referring to
The system and method of the present disclosure may transform images, for example, acquired using an equidistant projecting fisheye lens with, for example, a 185 degree viewing angle coupled to a 2592×1920 pixel CMOS imaging sensor, for instance, acquired at the rate of 5 frames per second.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements, if any, in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed, The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Various aspects of the present disclosure may be embodied as a program, software, or computer instructions embodied in a computer or machine usable or readable medium, which causes the computer or machine to perform the steps of the method when executed on the computer, processor, and/or machine. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform various functionalities and methods described in the present disclosure is also provided.
The system and method of the present disclosure may be implemented and run on a general-purpose computer or special-purpose computer system. The computer system may be any type of known or will be known systems and may typically include a processor, memory device, a storage device, input/output devices, internal buses, and/or a communications interface for communicating with other computer systems in conjunction with communication hardware and software, etc.
The terms “computer system” and “computer network” as may be used in the present application may include a variety of combinations of fixed and/or portable computer hardware, software, peripherals, and storage devices. The computer system may include a plurality of individual components that are networked or otherwise linked to perform collaboratively, or may include one or more stand-alone components. The hardware and software components of the computer system of the present application may include and may be included within fixed and portable devices such as desktop, laptop, server. A module may be a component of a device, software, program, or system that implements some “functionality”, which can be embodied as software, hardware, firmware, electronic circuitry, or etc.
The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6097854 | Szeliski et al. | Aug 2000 | A |
6493031 | Washizawa | Dec 2002 | B1 |
20080058593 | Gu et al. | Mar 2008 | A1 |
Entry |
---|
Price et al., Stabilizing Fisheye Video From a Light-Mounted Camera, 15th World Congress on Intelligent Transport Systems and Its America's 2008 Annual Meeting, New York, NY, Nov. 16-18, 2008. |
Potucek, Automatic Image Stabilization for Omni-directional systems, Proceedings of the Fifth IASTED International Conference on Visualization, Imaging,and Image Processing, 2005, pp. 338-342. |
Shi et al., Good Features to Track, IEEE Conference on Computer Vision and Pattern Recognition, CVPR94, Jun. 1994, Seattle. |
Number | Date | Country | |
---|---|---|---|
20110091131 A1 | Apr 2011 | US |