This disclosure relates to methods and systems for determining a camera's extrinsic parameters. In particular, the disclosure relates to the determination of a camera's six degree-of-freedom pose using image features.
A method for determining a camera's location uses fiducial markers with specialized designs from which known features can be extracted. For example, this can be done with QR-code-like markers or a checkboard pattern. Another method for determining a camera's location does not require any markers but instead employs a moving camera to map a scene and estimate the poses concurrently. An example of this latter method is visual simultaneous localization and mapping (VSLAM).
In the case where specialized markers are not desirable (e.g., for esthetic reasons) and it is not possible to move the camera, it may be useful to have a system that can calibrate the camera from a static viewpoint by capturing some known arbitrary graphic pattern.
A method of determining extrinsic parameters of a camera is disclosed. The method involves obtaining a digital calibration image and generating a plurality of synthetic views of the calibration image, each synthetic view having a set of virtual camera parameters. The method also includes identifying a set of features from each of the plurality of synthetic views, obtaining a digital camera image of a representation of the digital calibration image and identifying the set of features in the digital camera image. The method includes comparing each feature in the set of features of the digital camera image, with each feature in each of the set of features of the synthetic views and identifying a set of matching features. The method includes the computing of virtual 3D positions of the matched synthetic features using the virtual camera parameters. The method concludes by computing the extrinsic camera parameters through solving the perspective n-points problem utilizing the virtual 3D positions with their matched captured features.
In drawings which illustrate by way of example only a preferred embodiment of the disclosure
This disclosure is directed to a camera calibration method and system for computing extrinsic camera parameters. A calibration image of arbitrary but asymmetric design may be embedded onto or printed on a substantially planar surface visible to the camera being calibrated. A synthetic pattern generator may produce synthetic views of the calibration image with virtual camera parameters. A feature detection and matching module may correlate two-dimensional (2D) points in the captured image with virtual three-dimensional (3D) points in the synthetic views. A calibration solver may then compute the extrinsic parameters from the 2D-3D correspondences.
The extrinsic parameters of a camera are typically composed of a translation component t=(X, Y, Z) and a rotation component R. In 3-space, the former may be represented as a 3-vector and the latter may be represented as a vector of Euler angles. The rotation component may alternatively be represented as a 3×3 rotation matrix, an angle-axis vector, or similar. Extrinsic calibration is the process of obtaining R and t. Intrinsic parameters of a camera are generally known from the camera and may include the field of view, focal length, and any lens distortion. Some intrinsic parameters may be changeable based on settings on the camera, such as focal length on a zoom lens but assumed to be known for the purposes of the calibration.
With reference to
The digital calibration image 20 may be of arbitrary design, although preferably with an asymmetry along at least one axis. The asymmetry may assist with avoiding ambiguous solutions. This digital calibration image 20 may be embedded on a plane with a known physical size to comprise the planar calibration pattern 30. Example embodiments include a flat moveable board with the printed design or a surface such as a wall or floor with a pasted decal. An embodiment may use one or more of these planar calibration patterns, with the requirement that each pattern contains a distinct design. The digital calibration image 20 and therefore the planar calibration pattern 30 may be a logo, background image or other design that may already normally appear in the camera's 10 field of view.
The image capture module 40 may convert a video signal from a video source into data suitable for digital image processing. The video source may be a digital camera or some other stream of video such as a video stream over the internet. The image capture module may provide an application programming interface, such as from a camera manufacturer of the camera 10 or a third-party software library.
With reference to
Multiple synthetic views may be generated so that more candidate feature points are available to the feature detection module. Having additional synthetic views may allow for additional sets of features. The feature extraction algorithm may not be invariant to changes in perspective, and therefore may produce different features from different viewing angles. Synthetic views may be generated by choosing virtual camera parameters such that the intrinsic parameters mirror the known camera intrinsic parameters. Extrinsic parameters may be selected from a space of translations and rotations where the calibration pattern is contained in the synthetic field of view. Synthetic views may be selected evenly from the space, or may be selected based on information on common positions of a camera. In one example embodiment, nine synthetic views evenly cover the hemisphere in front of the calibration pattern while keeping the virtual cameras' local y-axis approximately aligned with the world's y-axis. These synthetic views may correspond to common camera positions with both the camera and calibration pattern mounted relative to the same horizontal orientation. In another example, synthetic views may be selected from camera locations known a priori or from commonly used positions where the camera is generally in front of the calibration pattern rather than at a highly oblique angle.
The feature detection module may comprise two sub-modules: a feature extraction module, and a feature matching module. A feature in this context may be a patch of image that can be identified by an accompanying descriptor. The descriptor may be an encoding of the salient patch information in a lower-dimensional space (for example, an N-D vector), that allows for some kind of similarity measure between patches, for example the L2 norm of the difference between two descriptors. The feature extractor module may find features in the synthetic views and in the captured image. An embodiment may use any algorithm that identifies features in a fashion invariant to scaling and rotation, such as Speeded Up Robust Features (SURF) or Maximally Stable Extremal Regions (MSER).
With reference to
With reference again to the example of
This process is then repeated for rest of the synthetic views 100a-100d. With the features from synthetic view 100b, five matches were found; with the features from synthetic view 100c, two matches were found and one match found in synthetic view 100d. In this example, no additional matches are made for a particular feature of the bottom right corner of the “F”. In this example, for each synthetic view, there were a number of features which were not matched.
In addition to matching by feature descriptor, the feature matching module may be made robust to false matches by enforcing a homography (3×3 matrix that relates points on a plane undergoing a perspective transformation). The homography may be obtained with an outlier-rejecting method such as Random Sampling Consensus (RANSAC).
To further increase the robustness of the matches, an embodiment of the feature matcher may consider only those matches that are contained within a region of interest (ROI) in the captured image. The region of interest may be represented as a bounding box or as a 2D contour. This ROI may be obtained from an initial guess based on “a priori” knowledge, or from a provisional estimate of the extrinsic parameters obtained without the ROI. In the latter case, an ROI may be obtained by projecting the contour of the extents of the calibration image using the provisional extrinsic parameters.
The calibration solver 54 may take as inputs the set of feature matches and the virtual camera parameters associated with all synthetic views. For each matching feature, it may first obtain the 2D image coordinate of the feature in the captured image. For the same matching feature, it may then compute the virtual 3D coordinate from the 2D image coordinate in the synthetic view the feature originated via the projective transform of the virtual camera. This virtual 3D coordinate mirrors a point on the planar calibration pattern; thus, it can be considered a real-world 3D coordinate.
From a set of 2D (captured) to 3D (world) point correspondences, the calibration solver 54 may compute an estimate of the extrinsic parameters R and t. This is known as the “perspective n-points” problem and in most cases, this problem is over-determined. An embodiment may use a method that minimizes the reprojection error, such as Levenberg-Marquandt optimization. Alternatively, an embodiment may use a RANSAC approach that samples subsets of 4 points and use a direct solution such as Efficient Perspective-n-Point (EPnP) at each iteration.
In one possible embodiment, the features of the synthetic views may be precomputed at the time the calibration image is selected. In this case, the synthetic pattern generation and feature extraction step may happen “offline”, in advance of the camera calibration. Once the synthetic features are computed, the calibration image can be discarded. During the camera calibration procedure, the camera's intrinsic parameters, the precomputed synthetic features and the captured image may be used for the calibration proceeding from the feature matching module 53.
In one embodiment, each of the feature detector module 52, feature matching module 53, synthetic pattern generator 51 and calibration solver 54 may each provided with at least one respective processor or processing unit, a respective communication unit and a respective memory. In another embodiment, at least two of the group consisting of the feature detector 52, feature matching 54, synthetic pattern generator 51 and calibration solver 54 share a same processor, a same communication and/or a same memory. In this case, the feature detector module 52, feature matching module 54, synthetic pattern generator 51 and/or calibration solver 54 may correspond to different modules executed by the processor of a computer machine such as a server. personal computer, a laptop, a tablet, a smart phone, etc.
A calibration module may include one or more Computer Processing Units (CPUs) and/or Graphic Processing Units (GPUs) for executing modules or programs and/or instructions stored in memory and thereby performing processing operations, memory, and one or more communication buses for interconnecting these components. The communication buses optionally include circuitry (sometimes called a chipset) that interconnects and controls communications between system components. The memory includes high-speed random access memory, such as DRAM, SRAM, DDR RAM or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. The memory optionally includes one or more storage devices remotely located from the CPU(s). The memory, or alternately the non-volatile memory device(s) within the memory, comprises a non-transitory computer readable storage medium. In some embodiments, the memory, or the computer readable storage medium of the memory stores the programs, modules, and data structures, or a subset described above.
Each of the elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing functions described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, the memory may store a subset of the modules and data structures identified above. Furthermore, the memory may store additional modules and data structures not described above.
In an embodiment, a calibration system may be integrated with and/or attached to a moveable camera system. As described above, the calibration system may determine the location and direction, i.e., the translation and rotation, of the camera. This determination may be done in real time, or near real time, as the camera is operated. The camera may be hand held or positioned on a dolly or tripod. The camera translation and rotation may be included with the captured images or video, such as embedded metadata. The translation and rotation information may be provided to other systems that handle or receive the output from the camera, such as for image or video recognition systems, virtual reality systems.
Various embodiments of the present disclosure having been thus described in detail by way of example, it will be apparent to those skilled in the art that variations and modifications may be made without departing from the disclosure. The disclosure includes all such variations and modifications as fall within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
3046609 | Jun 2019 | CA | national |
This application is a continuation of International Patent Application No. PCT/IB2020/052938, filed on Mar. 27, 2020, which claims priority to Canadian Patent Application No. 3,046,609, filed on Jun. 14, 2019. All the aforementioned patent applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/IB2020/052938 | Mar 2020 | US |
Child | 17644269 | US |