The following example embodiments relate to a three-dimensional (3D) calibration method and apparatus for multi-view phase shift profilometry, and more particularly, to a 3D calibration method and apparatus for multi-view phase shift profilometry that may improve quality of a 3D reconstruction task in a multi-view system.
Acquiring geometric data of objects present in a three-dimensional (3D) space, that is, 3D scanning refers to technology with a great value. In particular, 3D scanning may be technology with infinite utility, such as equipment to acquire and inspect information on a 3D shape of an object, such as a semiconductor.
Representative 3D scanning technology includes a phase measuring profilometry (PMP) method, which is a method of acquiring 3D information of an object by projecting a predefined structured light image through a projector and capturing the same through a camera using the camera and the projector. However, when performing 3D scanning of an object with this method, there are restrictions that the camera needs to be able to view a point to be acquired and light emitted from the projector needs to reach the point. For example, when it is assumed that the camera is present in a downward direction from the top of a system, side data of an object that may be acquired as a result of 3D scanning may be very insufficient. Therefore, to perform 3D scanning of the object in all directions, the projector needs to emit light from as many different directions as possible and it is necessary to perform capturing from cameras present at various positions.
In addition, to acquire high-quality results, it is necessary to combine 3D data, to discard unnecessary points, and to process data based on more reliable points, rather than simply acquiring the 3D data with a combination of a single camera-and-projector pair.
Example embodiments describe a three-dimensional (3D) calibration method and apparatus for multi-view phase shift profilometry, and more particularly, provide technology for optimizing various parameters including position information of a projector and a camera in a system with a plurality of top and side cameras.
Example embodiments provide a 3D calibration method and apparatus for multi-view phase shift profilometry that may perform calibration of a phase measuring profilometry (PMP) system for 3D reconstruction with only one target by performing shooting for a calibration target and then adding a depth map of a side camera to a depth map of a main camera and by acquiring a final 3D point cloud in which a depth map fusion is performed and may perform calibration and 3D reconstruction with only one scene without performing additional shooting while moving a corresponding target.
According to an example embodiment, there is provided a 3D calibration method for multi-view phase shift profilometry, performed by a computer device, the 3D calibration method including acquiring a phase that is data required for 3D reconstruction from parameters of at least one camera and a projector based on a phase measuring profilometry (PMP) method; defining a phase-to-depth function for each camera-and-projector combination and performing calibration of optimizing the parameters of the camera and the projector; and acquiring a point cloud that includes depth information using the optimized parameters of the camera and the projector.
The acquiring of the phase may include projecting a series of structured light images using the projector; detecting projected light that is reflected from an object using a sensor of the at least one camera; and acquiring the phase required for the 3D reconstruction using a single camera-and-projector pair.
The acquiring of the phase may include configuring a main camera located at the top and a side camera located on at least one side surface, and acquiring the phase based on the PMP method using the projector located between the main camera and the side camera.
The 3D calibration method may further include performing transformation from world coordinates to camera coordinates for all points in the point cloud, storing a z value of the camera coordinates, and acquiring a depth map for each camera-and-projector pair; and computing pixel-wise confidences of all the depth maps and performing the 3D reconstruction by acquiring a final point cloud through a confidence-based depth map fusion that integrates the respective depth maps into a single depth map based on the confidences.
The performing of the 3D reconstruction by acquiring the final point cloud may include performing a depth map fusion through a weighted sum with a depth map of the main camera along with a corresponding confidence for points visible to the main camera, with respect to points in a 3D space acquirable through a depth of each pixel when computing the pixel-wise confidences of the depth maps; and performing the depth map fusion through a weighted sum in a depth map of a remaining side camera for points not visible to the main camera.
According to another example embodiment, there is provided a 3D calibration apparatus for multi-view phase shift profilometry, the 3D calibration apparatus including a phase acquisition unit configured to acquire a phase that is data required for 3D reconstruction from parameters of at least one camera and a projector based on a PMP method; a calibration unit configured to define a phase-to-depth function for each camera-and-projector combination and to perform calibration of optimizing the parameters of the camera and the projector; and a point cloud acquisition unit configured to acquire a point cloud that includes depth information using optimized parameters of the camera and the projector.
The phase acquisition unit may be configured to acquire the phase required for the 3D reconstruction using a single camera-and-projector pair in such a manner that the projector projects a series of structured light images and projected light is reflected from an object and detected at a sensor of the at least one camera.
The phase acquisition unit may be configured to configure a main camera located at the top and a side camera located on at least one side surface, and to acquire the phase based on the PMP method using the projector located between the main camera and the side camera.
The 3D calibration apparatus may further include a depth map acquisition unit configured to perform transformation from world coordinates to camera coordinates for all points in the point cloud, to store a z value of the camera coordinates, and to acquire a depth map for each camera-and-projector pair; and a depth map fusion unit configured to compute pixel-wise confidences of all the depth maps and to perform the 3D reconstruction by acquiring a final point cloud through a confidence-based depth map fusion that integrates the respective depth maps into a single depth map based on the confidences.
The depth map fusion unit may be configured to perform a depth map fusion through a weighted sum with a depth map of the main camera along with a corresponding confidence for points visible to the main camera, with respect to points in a 3D space acquirable through a depth of each pixel when computing the pixel-wise confidences of the depth maps, and to perform the depth map fusion through a weighted sum in a depth map of a remaining side camera for points not visible to the main camera.
According to example embodiments, it is possible to provide a 3D calibration method and apparatus for multi-view phase shift profilometry that may optimize various parameters including position information of a projector and a camera and may improve quality of a 3D reconstruction task in a system with a plurality of top and side cameras.
Also, according to some example embodiments, it is possible to provide a 3D calibration method and apparatus for multi-view phase shift profilometry that may perform calibration of a PMP system for 3D reconstruction with only one target by performing shooting for a calibration target and then adding a depth map of a side camera to a depth map of a main camera and by acquiring a final 3D point cloud in which a depth map fusion is performed and may perform calibration and 3D reconstruction with only one scene without performing additional shooting while moving a corresponding target.
Hereinafter, example embodiments will be described with reference to the accompanying drawings. However, the example embodiments may be modified in various other forms and the scope of the present invention is not limited to the example embodiments described below. Also, various example embodiments are provided to more fully explain the present invention to one of ordinary skill in the art. Shapes and sizes of elements in the drawings may be exaggerated for clear explanation.
The example embodiments aim to perform three-dimensional (3D) reconstruction of an object using a main camera located at the top of a system, a side camera located at the side of the system, and a projector located therebetween. To this end, 3D data of the object is to be acquired using a commonly available phase measuring profilometry (PMP) method.
First, in the PMP method, a projector projects a series of structured light images and projected light is reflected from an object and sensed by a sensor of a camera. In this case, data required for 3D reconstruction, that is, a phase may be acquired using a single camera-and-projector pair. A process of generating the structured light image is called encoding, and a process of transforming an image captured from the camera to a phase map is called decoding. Here, a final goal of the PMP method is to define and optimize a phase-to-depth function for each camera.
The phase-to-depth function is to transform phase to depth and may be expressed in various types of models. In the example embodiment, a stereo model is adopted. A stereo triangulation model models the projector with a single pinhole camera and acquires 3D data in the same manner as the existing stereo matching. In the stereo triangulation model, the phase-to-depth function may be expressed as a function of an intrinsic parameter and an extrinsic parameter of the camera and the projector. Therefore, a main goal of the example embodiment may be to establish a calibration process of optimizing parameters of the camera and the projector.
The phase-to-depth function may be defined for every camera-and-projector combination at the same time of performing calibration of the camera and the projector.
However, through the above process, each camera-and-projector combination may be acquired and 3D data may be acquired in a form of an independent point cloud or depth map. Here, an error may occur therebetween. Therefore, 3D reconstruction is finally completed through a confidence-based depth map fusion that integrates the respective depth maps into a single depth map. In this confidence-based depth map fusion process, a confidence is computed for each pixel of each depth map. Here, points in a 3D space that may be acquired through depth of each pixel may be divided into two cases: a case in which a point is visible to a main camera located at the top and a case in which a point is not visible.
For points visible to the main camera, a depth map fusion is performed through a weighted sum with a depth map of the main camera along with a corresponding confidence. For points not visible to the main camera, the depth map fusion is performed through a weighted sum in a depth map of a remaining side camera.
Hereinafter, a 3D calibration method and device for multi-view phase shift profilometry according to an example embodiment will be described in more detail.
Table 1 shows a notation.
Camera parameters may be divided into an extrinsic parameter, such as R and t, and an intrinsic parameter including K and lens distortion coefficients (k1, k2, k3). The extrinsic parameter refers to a parameter for a relationship between a world coordinate system and a camera coordinate system. The following relationship is established between a world coordinate Xw of a point present in a 3D space and a camera coordinate Xc.
Here, R and t are 3×3 and 3×1 matrices, respectively, each expressing a rigid-body transform in the 3D space and may be expressed as follows.
In [Equation 2], vector q includes a total of six elements, three translations and three quaternions.
In a pinhole camera model, pc is projected onto the normalized image plane, that is, the plane of z=1 in the camera coordinate system.
In [Equation 3], {circumflex over (x)}c denotes a homogeneous coordinate of xc. Meanwhile, a camera using a telecentric lens, such as the main camera at the top of the system, is projected in a slightly different manner.
An intrinsic parameter of the camera specifies a relationship between xn and a pixel coordinate p. First, radial distortion correction by a camera lens is performed at normalized image coordinates.
xc′ and yc′ denote distorted normalized coordinates distorted in the x axis and the y axis, respectively. Then, the camera matrix K denotes a relationship between a distorted normalized camera coordinate xn′ and the pixel coordinate p.
Through the above process, the pixel coordinate or vice versa may be computed from an arbitrary 3D point Xw.
When it is assumed that there is no distortion by the camera lens, a relationship between the 3D point Xw and the corresponding pixel p in the pinhole camera model is expressed as follows.
s denotes a scale factor of a homogeneous coordinate, {circumflex over (p)}, each of ƒx, ƒy denotes a focal length, and each of u0 and v0 denotes a principal point. Meanwhile, a case of a telecentric model is as follows.
Here, R and t denote transformation matrices from a checkerboard coordinate system to the camera coordinate system. Here, a relationship between a checkerboard present on one plane and a corresponding pixel may be explained by adding a condition of zw=0.
Likewise, in the case of a telecentric lens camera, it may be expressed as follows.
Here, homography H may acquire a solution in a closed form through coordinates of a predefined checkerboard corner and values of pixel coordinates recognized from an image. The homography includes product of K and extrinsic parameters R and t. However, to inversely compute parameters from H, a plurality of items of homography need to be secured by capturing checkerboard images of various orientations. Theoretically, at least three items of homography are required to compute solutions of all parameters.
Therefore, a camera calibration process generally starts with acquiring images by variously changing an angel of a plane on which a checkerboard pattern is engraved. However, a novel calibration target object is proposed to take advantage of a multi-view system and to perform a projector calibration, which is described below.
The proposed calibration target is designed to meet the following conditions while performing the example embodiments.
An ID of each Aruco marker of a target does not overlap. Therefore, when a single marker is recognized by a camera, on which plane a corner of the marker is present may be known. Also, each world coordinate system may be defined based on a bottom plane and a coordinate system on a remaining plane may be transformed to a world coordinate system through R and t. For example, if transformation from a local coordinate system to a world coordinate system on an ith plane is Rplane,ƒ, tplane,ƒ, it may be expressed as follows.
In a typical calibration method of capturing a checkerboard pattern once for each image, one homography is computed for each image and a checkerboard pattern at this time is captured over wide pixels. On the other hand, in a method according to an example embodiment of capturing a plurality of checkerboard planes in one image, a single checkerboard pattern is effectively applied in a narrow area within the image. In the case of an actual camera, a distortion phenomenon by a lens is present and the distortion phenomenon relates to not a linear issue but a nonlinear issue. Therefore, it is difficult to optimize K, R, and t through a plurality of items of homography and it is easy to fall into local minima.
In the case of falling into local minima, it indicates that not only camera parameters but also a coordinate system transformation of each plane in a calibration target may be erroneously optimized. However, in the case of using the aspect that one plane is visible to a plurality of cameras, a camera parameter may be computed without falling into local optima by taking the advantage of a multi-view system.
Initially, since which ID marker is present on each plane is known and local coordinates of each marker corner are predefined, world coordinates of each marker corner may be expressed as a function that uses Rplane,ƒ, tplane,ƒ as parameters. Then, p predicted through [Equation 1], [Equation 3], [Equation 4], [Equation 5], and [Equation 6] may be expressed as a function that uses world coordinates of the marker corner and an intrinsic parameter and an extrinsic parameter of the camera as parameters and may be compared to actual pgt (undistorted). Accordingly, by optimizing a parameter that simultaneously minimizes a sum of marker corners recognized by all cameras, not only each parameter of all cameras but also Rplane,ƒ, tplane,ƒ for all planes of the calibration target may also be optimized. Since a total of five cameras capture a target including 18 planes, the system used in the example embodiment may be expressed as follows.
Pixel intensity of an image projected from the projector is in a cosine-wave form of various frequencies in a horizontal axis direction or a vertical axis direction. Frequencies actually used in the example embodiment are ƒ=40, 41, 42, 43. total of 4×4=16 images are projected in each of the vertical direction and the horizontal direction by shifting a phase by
for waves generated at a total of four frequencies.
Light projected from the projector is reflected from an object and detected by a camera. When intensity of an ith image projected with the frequency ƒ at a pixel (u, v) is Iƒ,t(u,v), it may be expressed as follows.
Here, by combining all pixel intensities at i=0, 1, 2, and 3, phase ϕƒ(u, v) at the pixel (u, v) when the frequency is ƒ may be acquired as follows.
However, ϕƒ(u, v) is a wrapped phase, not an unwrapped phase that is actually used for 3D reconstruction. A wrapped phase at ƒ=40 includes all areas of an object in a single wavelength, but may not contain detailed information. An unwrapped phase at ƒ=43 includes locally detailed information, but does not include all areas in a single wavelength.
Therefore, after computing the wrapped phase for all frequencies ƒ, the unwrapped phase ψ(u, v) that encompasses all the wrapped phases from ƒ=40 to ƒ=43 needs to be computed.
(a) of
When ψƒ is an unwrapped phase that encompasses wrapped phases from ƒ=40 to ƒ=4j, it may be expressed as follows.
Also, a process of computing ψj+1(u, v) using ϕj+1(u, v) and ψj(u, v) is as follows.
A finally unwrapped phase ψ(u, v)=ψj(u, v) is used as an input value for the 3D reconstruction.
In the example embodiment, the projector uses a pinhole camera model, such as a side camera. However, unlike the side camera capable of directly acquiring pixel coordinates, the projector needs to indirectly use an unwrapped phase captured by the camera and thus, a slight change in an intrinsic parameter of the projector is required. Basically, the unwrapped phase has a linear relationship with pixel coordinates (u, v)projector in the pinhole camera model.
Substituting this into [Equation 6], it may be expressed as follows.
In [Equation 18], Xn,p denotes distorted coordinates present on a normalized projector plane in the projector. Also, assuming that all parameters of a camera and a projector are known in any one camera-and-projector pair, a 3D point acquired by using phase and pixel coordinates as input values needs to meet the following equation.
When geometrically analyzed, first two equations of [Equation 19] represent a straight line that passes through a point [xn, yn, 1] present on the normalized camera plane and a camera origin, and the last equation represents a plane that includes a projector origin and a straight line x=xn,p (or y=yn,p) on a normalized projector plane. A point pintersect=[xw, yw, zw] reconstructed in the 3D space may be an intersection point of the straight line and the plane.
Arranging [Equation 19], pintersect (Xintersect in
Also, since an Aruco marker ID of a calibration target is unique, an area on an image on which plane is located may be extracted from the camera. A phase (ψx, ψy) corresponding to a pixel (u, v) within an area in which a plane i is captured may be transformed to
Therefore, an optimal parameter of the projector may be acquired through an optimization process of using (nipintersect+ci)2 as a loss and using an intrinsic parameter and an extrinsic parameter of the projector as parameters.
Here, ni denotes an i-plane normal vector.
A point cloud may be acquired through [Equation 20] using parameters of the camera and the projector optimized through the above process. Also, through [Equation 1], transformation from world coordinates to camera coordinates may be performed for all points in the point cloud and a depth map may be acquired for each camera-and-projector pair by storing a z value of the camera coordinates. However, due to a measurement error in the camera, an error may occur in the depth map. Therefore, pixel-wise confidences of all depth maps need to be computed and a process of integrating the point cloud based on the same needs to be performed.
To acquire a depth map pixel-wise confidence of a corresponding camera by using an arbitrary camera as a reference, depth dref(u, v) corresponding to each pixel (u, v) may be acquired and world coordinates of a corresponding point in the 3D space may be acquired and world coordinates of the corresponding point in the 3D space may be acquired. Also, a remaining camera excluding a reference camera may transform world coordinates of a corresponding point p(u, v) to each camera coordinate through [Equation 1]. Here, a camera j(ĭref) acquires values of pixel coordinates uj and vj corresponding to p(u, v) and then performs a visibility test using a depth map of the camera j.
Visibility vj(p(u, v)) is 1 when the camera j compares an actual depth zj of p(u, v) to dj(uj, vj) acquired through the depth map of the camera j and, as a result, dj(uj, vj) is larger, and is zero when dj(uj, vj) is smaller, and thereby p is determined to not be visible from the camera j. For every camera j with vj(p)=1, confidence Cref,j(u, v) may be computed as follows.
Confidence Cref(u, v) of a pixel (u, v) of the reference camera may be represented as follows for every camera j with vj(p(u, v))=1.
By going through a process of computing confidences for all depth maps and then integrating the same into one through the aforementioned process, the 3D reconstruction is finally completed. For arbitrary camera j, a depth and a confidence are given for each pixel of a depth map. Here, there may be a case in which a 3D point pj acquired through the depth map is visible to the main camera and a case in which it is not visible. When it is visible to the main camera, depth and confidence (d, C) of pj may be stored by the camera in an array Lmain(u, v) for a pixel (u, v) corresponding to pj. When this process is performed for depth maps of all cameras, a 3D point array may be given for each pixel (u, v) of the main camera. Accordingly, the depth map of the main camera is computed in a form of a weighted sum as follows.
When pj is not visible to the main camera, the side camera performs the aforementioned process and a confidence-based depth map fusion is finally completed.
Referring to
It can be seen that depth and confidence variously appear due to a plurality of variables, including view vectors of the projector and the camera and an angle of reflected light.
Referring to
Also, the 3D calibration method for multi-view phase shift profilometry may further include operation S140 of performing transformation from world coordinates to camera coordinates for all points in the point cloud, storing a z value of the camera coordinates, and acquiring a depth map for each camera-and-projector pair, and operation S150 of computing pixel-wise confidences of all the depth maps and performing the 3D reconstruction by acquiring a final point cloud through a confidence-based depth map fusion that integrates the respective depth maps into a single depth map based on the confidences.
According to example embodiments, camera calibration may be performed without moving a single target and calibration of a camera and a projector may be simultaneously performed with a single target. Also, a phase-to-depth function through an inverse camera model of the projector may be defined.
Hereinafter, the 3D calibration method for multi-view phase shift profilometry according to an example embodiment is further described.
The 3D calibration method for multi-view phase shift profilometry according to an example embodiment will be described using a 3D calibration apparatus for multi-view phase shift profilometry as an example.
Referring to
In operation S110, the phase acquisition unit 1110 may acquire a phase that is data required for 3D reconstruction from parameters of at least one camera and a projector based on a PMP method.
In detail, the phase acquisition unit 1110 may acquire the phase required for the 3D reconstruction from a single camera-and-projector pair in such a manner that the projector projects a series of structured light images and projected light is reflected from an object and detected at a sensor of the at least one camera. Here, the phase acquisition unit 1110 may configure a main camera located at the top and a side camera located on at least one side surface, and may acquire the phase based on the PMP method using the projector present between the main camera and the side camera.
In operation S120, the calibration unit 1120 may define a phase-to-depth function for each camera-and-projector combination and may perform calibration of optimizing the parameters of the camera and the projector. Here, the phase-to-depth function relates to transforming the phase to a depth or a depth map.
That is, the calibration unit 1120 may perform a calibration of forming a phase-to-depth map with a camera parameter using a fixed target. In the prior art, calibration may be performed by performing shooting while physically turning a flat target several times. In the case of using this dynamically moving object, the phase-to-depth map of the PMP method may not be calibrated. The example embodiments may simultaneously acquire camera parameters through phase-to-depth map calibration of the PMP method by intentionally making a target static and by using the same to project images of various phases onto the target.
In operation S130, the point cloud acquisition unit 1130 may acquire a point cloud that includes depth information using optimized parameters of the camera and the projector.
In operation S140, the depth map acquisition unit 1140 may perform transformation from world coordinates to camera coordinates for all points in the point cloud, may store a z value of the camera coordinates, and may acquire a depth map for each camera-and-projector pair.
In operation S150, the depth map fusion unit 1150 may compute pixel-wise confidences of all the depth maps and may perform the 3D reconstruction by acquiring a final point cloud through a confidence-based depth map fusion that integrates the respective depth maps into a single depth map based on the confidences.
Referring to
For points visible to the main camera 1201, a depth map fusion may be performed through a weighted sum 1220 with a depth map 1210 of the main camera along with the corresponding confidence 1250. For points not visible to the main camera 1201, the depth map fusion may be performed through a weighted sum 1270 in a depth map 1280 of a side camera 1202. Through this, a final 3D point cloud 1230 may be acquired.
As a result, calibration of a PMP system for 3D reconstruction is successfully performed with only one target through the example embodiment and it can be seen that the calibration and the 3D reconstruction may be performed with only one scene without performing additional shooting while moving a corresponding target. Also, the phase-to-depth function may be configured using only parameters of the camera and projector without additional parameters.
The aforementioned apparatuses may be implemented through hardware components, software components, and/or combination of the hardware components and the software components. For example, the apparatuses and the components described in the example embodiments may be implemented using one or more general-purpose or special purpose computers, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), a programmable logic unit (PLU), a microprocessor, or any other device capable of responding to and executing instructions in a defined manner.
Although the example embodiments are described with reference to some specific example embodiments and accompanying drawings, it will be apparent to one of ordinary skill in the art that various alterations and modifications in form and details may be made in these example embodiments without departing from the spirit and scope of the claims and their equivalents. For example, suitable results may be achieved if the described techniques are performed in different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Therefore, other implementations, other example embodiments, and equivalents of the claims are to be construed as being included in the claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2022-0152975 | Nov 2022 | KR | national |