The present invention relates to the field of computation and covers methods to find one to one mapping between fiducial markers on a tracked object and fiducial marker projections on the image plane captured by a camera in optical object tracking systems.
It is known that there are methods and models to track a three dimensional object in an environment and compute its position and orientation (pose) with respect to a predetermined coordinate system. These kinds of tracker systems are used for example in aircrafts to determine the orientation of head of the pilot. Once the orientation is acquired with respect to the coordinate system of say the display devices, then it is possible to generate graphics on these accordingly. There are different methods to track an object in the scene using magnetic, mechanical or optical means. Currently, the spatial relations of objects may also be determined using magnetic sensors or laser beams but this invention relates specifically to systems using camera-based (day-tv, thermal, IR, Time of Flight etc.) trackers.
In one of the optical camera-based systems the pilot wears a helmet with patterns (fiducial markers) and at least one tracker camera determines the helmet's position and orientation using geometric calculations based on these patterns. The patterns used in camera-based tracker systems are either graphical (generally black and white) patterns (passive marker) tracked by visible light cameras or arrays of light sources (e.g. light emitting diodes or LEDs) (active marker). These light sources can be chosen to be in the infrared range of the electromagnetic spectrum with suitable selection of camera sensor and filter set. Other arrangements are also possible but the most convenient among them is the one with the infrared LEDs since these systems can work under inappropriate lighting conditions. Computing spatial relation between an object having a tracking pattern, and a camera is therefore, well known in the state of the art. Throughout the document, whenever a spatial relation is mentioned, it should be understood that the relation between an entity's predetermined reference system with respect to the other's is meant. The objective is to find rotation and translation between camera and 3D object so that the object's 3D location and orientation is known. This reference system is generally based on the respective pattern of an object under consideration. Since the position and orientation of the tracker camera with respect to the other coordinate systems is known (or can be calculated or measured) in a tracker system, it is also possible to compute the helmet's spatial relation with the tracker camera's sensor and then with other coordinate systems. In this context, “tracked object” means an object having a tracking pattern (fiducial marker) and being tracked by a tracker system. It may be either a helmet as in a helmet-mounted tracker system or any other object.
Pose estimation problem for said optical camera-based tracking systems can be stated as follows: Given a set of N feature correspondences between three dimensional (3D) points of an object and two dimensional (2D) projection of that object onto the image plane, find the rotation and translation of the object with respect to the reference system of the camera. The objective is to find rotation and translation between camera and 3D object so that the object's 3D location and orientation is known. Formal definition of pose estimation problem requires correct solution to its dual, namely correspondence problem. Correspondence problem can be stated as follows: Given (or not-given) rotation and translation of the object with respect to the reference system of the camera, find the N feature correspondences between 3D points of an object and 2D projection of that object onto the image plane. The objective is to find correspondences between 3D points of an object and 2D projection of that object onto the image plane so that this can be used to find the object's 3D location and orientation. This problem is also well known in the state of the art. However we have two problems which require solution to its dual to be correctly solvable. Thus, a tracker system requires an initial solution to either pose estimation problem or correspondence problem.
Correspondence problem is called an initial correspondence problem if rotation and translation of the object with respect to the reference system of the camera is not given. If one opts to solve initial correspondence problem, then he/she also make completion time for solving correspondence problem as short as possible. Such effort is necessary since it is possible that tracked object may leave view frustum of the camera then comes back to it. In such cases restart of track process is necessary, thus solving initial correspondence problem fast shortens the startup time. In one of the preferred application, for helmet tracking, fast startup significantly cuts off blind time, especially if pilots head leaves cameras view frustum many times.
There are some currently used methods to determine initial correspondences between 3D points of the tracked object and 2D projection of that object onto the image plane. In one of the methods, a number of consecutive images are used where number of lit LEDs increase by one at each iteration. Then with proximity calculations 3D to 2D matching is calculated, with the assumption that pose change can't be much between consecutive captured frames. Newly lit LED will be the one that has not matched to any 2D point, which is the added to the matched list. Solving correspondence problem is easier where pose of the tracked object is known with small error. Then similar proximity calculations can be carried out to find 3D to 2D matching.
Solving initial correspondence problem with current methods requires many frame captures, thus it requires long time whereas usage of methods that use pose of the tracked object can't be used since rotation/translation data of tracked object is not available initially. The current methods are not offering an effective way of solving initial correspondence problem. To provide a solution to this problem, a new methodology should be introduced which solves the problem in a much efficient way. Furthermore, proposed method can be used to solve correspondence problem (not only the initial case) thus offer an all around solution to the correspondence problem.
The United States patent document US005828770A (Leis, Ristau), an application in the state of the art, discloses a method based on proximity calculations which uses a number of consecutive frame captures to solve initial correspondence problem. In the same document, a similar proximity based method that solves correspondence problem when pose of the tracked object is known is also presented.
An objective of the present invention is to find one to one mapping between fiducial markers on a tracked object and fiducial marker projections on the image plane captured by a camera in optical object tracking systems.
A system and method realized to fulfill the objective of the present invention is illustrated in the accompanying FIGS., in which:
100. Method for solving correspondence problem.
200. Method for selecting LED groups to be used for solving correspondence problem.
C. Camera
O0. Object elevation (0)
O1. Object elevation (−45)
O2. Object elevation (45)
B0. Bottom LED projection with object elevation 0 degrees.
B1. Bottom LED projection with object elevation −45 degrees.
B2. Bottom LED projection with object elevation 45 degrees.
T0. Top LED projection with object elevation 0 degrees.
T1. Top LED projection with object elevation −45 degrees.
T2. Top LED projection with object elevation 45 degrees.
C0. Center LED projection with object elevation 0 degrees.
C1. Center LED projection with object elevation −45 degrees.
C2. Center LED projection with object elevation 45 degrees.
F0. Flank LED projection with object elevation 0 degrees.
F1. Flank LED projection with object elevation −45 degrees.
F2. Flank LED projection with object elevation 45 degrees.
A method for solving correspondence problem (100) essentially comprises the steps of,
In step 101, we turn on LED in the current LED group so that their projections on 2D space can be seen. Each group preferably does consist of four LEDs selected in a particular pattern on the tracked object. Selection of four LEDs to be lit stem from two main factors. First, some geometric rules need to be defined to find correspondences and simpler rules can be defined and executed using fewer LEDs. Second, theoretical minimum number of correspondences necessary to calculate pose of an object is four. Thus in step 101, as few LEDs as possible is lit while lit LEDs still allow calculation of pose of tracked object.
Projections of lit LEDs in 2D space will have an area effect, several pixels will have high intensity values, and thus some form of image processing will be needed to be applied. This may be in the form of simple thresholding, calculating connected components, calculating center of gravity of close components etc. These processes are well known in the state of art and will not be discussed here. At the end of this preprocess, we will have four pairs of (x,y) values each one of them correspond to one of the visible LEDs.
In step 102, we determine if all lit LEDs are visible from cameras point of view. It is possible that tracked object is placed in a pose that blocks LEDs to be seen by the camera or some LEDS on tracked object are out of camera view frustum. In such cases geometric rules defined won't be applicable. Furthermore, even if correspondence calculation is successful, pose estimation cannot be carried out since less number of correspondences is known that theoretical limit of four. In such a case we switch out to a new LED set and try again.
LEDs in a LED group are placed in a special pattern. Sample pattern can be seen in
In step 103, we find the correspondence regarding the center LED. The geometric rule to identify center LED is created with the assumption that center LED has smallest total distance to other lit LEDs. This is a reasonable assumption since 3D position of the center LED more or less in the middle of other LEDs thus it is safe to assume it will be for 2D projection of LEDs. Distance between two LEDs is calculated on 2D space and as Euclidean distance. Then we identify the center LEDs projection as the one having smallest total distance to the other LEDs.
However, it is not possible to always identify center LED without doubt. When two LEDs have total smallest distance to other LEDs are very close to each other we say confidence on center LED decision is low. Such cases may happen if camera is placed sideways position relative to the LEDs, then it is possible to confuse center LED with flank LED, or it is possible to confuse center LED with top/bottom LED. In theory if 3D to 2D projection is perfect and LEDs are coplanar, we won't need such a confidence metric. However optic distortion, image processing techniques to produce some error and LEDs being placed in a non-coplanar fashion makes such metric necessary. Such cases are named as ambiguous identifications. In step 104, we determine if center LED identification is ambiguous. Such determination can be achieved via simple relative thresholding (if second smallest total distance is within a predetermined percentage of the smallest total distance we say LED identification is ambiguous). If an ambiguous determination is present, we switch out to a new LED set and try again.
In step 105, identification of the flank LED takes place. Geometric rules defined to identify flank LED requires us to construct lines from center LED to other three LEDs in the image. Then for each line we calculate the orthogonal distance of remaining 2 LEDs to the line, thus we calculate 6 (3×2) values in total. LED corresponding to the smallest value out of calculated six values is identified as one of the top/bottom LEDs. LED that is used to construct the line (along with center LED) that is used to calculate smallest of six values is identified as the other top/bottom LED. Since center LED and top/bottom LEDs are identified, remaining LED becomes the flank LED.
At this point it should be noted that two LEDs are identified as top/bottom LEDs, however there is no one to one mapping between two LEDs to top and bottom positions. In step 106, calculation of top and bottom LEDs takes places. Geometric rule defined to identify top and bottom LEDs requires us to construct a line from flank LED to center LED. Then we can identify the LED on the right side of the line as top LED and the LED on the left side of the line as the bottom LED (assuming flank LED is selected on the right side, otherwise switch bottom LED with top LED). It should be noted that special placement of the LEDs does not allow confusion on correspondence calculations aside from center LED ambiguity, thus explained method presents a robust way of calculating correspondences.
In step 107, calculation of pose using the calculated four correspondences takes place. It is possible to calculate the pose using four correspondences. However, for some cases there is no one to one mapping between 2D view of LEDs and pose (i.e. from a particular 2D view of LEDs it is possible to calculate 2 poses both of which are correct). This case is called pose ambiguity and it is an inherit problem of pose estimation problem. An instance of pose ambiguity problem is illustrated in
We also should note that problem is not bounded to a single axis, but it is present when poses that correspond to a single 2D view are symmetrical.
In step 108, we determine if the pose calculation is ambiguous. If it is not, calculated correspondences are correct and they can be used to calculate pose. We can start normal iteration of pose estimation and correspondence calculation. However, if the pose is ambiguous, even though calculation of correspondences is correct, they cannot be used to in pose estimation process. Thus, calculated correspondences are not sufficient to start normal iteration of pose estimation and correspondence calculation. In such a case, we switch out to a new LED set and try again.
If method for solving correspondence problem (100) is executed using only a group of four LEDs, there would be cases where correspondences could not be calculated or pose of an object cannot be found because of pose ambiguity. To avoid such cases, we propose to use several LED groups. However, LED groups should be selected mindfully so that minimal number of LED groups should be selected while covering all the possible poses that object can take. Method for selecting LED groups (200) to be used for solving correspondence problem (100) specifies such a mindful selection process which uses poses of tracked object under working conditions for better LED group selection that suits use case in better fashion.
A method for selecting LED groups (200) to be used for solving correspondence problem (100) essentially comprises the steps of,
Positions of active fiducials on the tracked object (say a helmet for a head tracking system with infrared LEDs) are represented each position on the object with a three dimensional coordinate. It is important to note that these coordinates on the mesh (model) are determined with respect to a common coordinate system and should be relatable to the camera location. At the same time pose data representing possible poses of tracked object under working conditions should also be introduced. In a preferred configuration of the present invention, these data are acquired using inertial measurement units (IMU) placed on the real object under real working conditions and movements of the objects are recorded to be used as the pose data.
In step 202, initialization of uncovered pose set takes place. Uncovered pose set is the set of currently uncovered poses, where uncovered means none of the currently selected groups of LEDs can be seen fully using the pose. This set is initially the whole input pose data. However, this set will be reduced, to an empty set, as more poses gets covered as the method progress.
In step 203, enumeration (generation) of all LED groups with ‘T’ shaped pattern takes place. This step is overly dependent on the positions of active fiducials on the tracked object. However, generation of LED groups is somewhat straightforward. A simple example can be given using n×n grid of LEDs. If we use 3×3 grid of LEDs on the n×n grid, middle LED in the 3×3 grid can be used in 8 LED groups where middle LED is the center LED in ‘T’ pattern (we can construct 4 straight lines of size 3 in 3×3 grid in shapes of ‘+’ and ‘x’, and it is possible to place flank LED on the right side or left). Similar strategies can be found for the specific input on active fiducial positions; however, any generation process should be expected to result in reasonably small number of LED groups.
Knowing relative spatial positions of tracked object and camera, it is possible to translate and rotate the LED groups mathematically using the input pose data, and visibilities of fiducials can be calculated. In the step 204, visibility of each non-selected LED group on the mesh from input camera position/orientation for every pose in uncovered pose set is calculated; and in step 205, a visibility value list is generated representing every non-selected LED group. Visibility value represents how many times all the nodes in a LED group were visible from the camera viewport, considering current uncovered pose set.
In a preferred configuration, an occlusion model is used to estimate the visibility of 3D model points (active markers) given the pose (Davis et al., 2004). It is based on ray tracing technique developed in computer graphics. In the case of LED fiducials, the visibility computation is based on the LED's normal with respect to the object coordinate system, LEDs illumination cone angle and the known pose of the object. The angle between the LED's normal and camera plane normal in camera coordinate system defines how perpendicular the LED is directed towards the camera (LED direction angle). The LED's illumination cone angle defines a minimum LED direction angle threshold for the LED to be visible for the camera. For a given pose, LED direction angle can be computed for each LED to determine its visibility. The marker's normal with respect to the object coordinate system, marker's illumination cone angle and the known pose of the object can equivalently applied to any active marker tracking system.
Then the non-selected LED group with highest visibility count is selected to be used in the method for solving correspondence problem (100) in step 206. The LED group with the highest visibility count, which was not determined as a LED group to be used in the method for solving correspondence problem (100) previously, is selected since it is fully visible for most of the determined pose data. In step 207, we eliminate poses covered by recently selected LED group from uncovered pose set and delete recently selected LED group from non-selected LED group set that is used for further calculations. In this configuration, number of elements in uncovered pose set is decreasing in each iteration since at least one pose is covered by newly selected LED group. Also, in this configuration, number of selected LED groups increases by one in each iteration.
In step 208, termination condition is checked. The main termination condition occurs when every pose in the initial pose data is covered by at least one selected LED group. This case corresponds to successful trained selection of LED groups with respect to input pose data. In some exceptional cases, where inputs pose data, camera position and active marker positions do not allow coverage of all poses. In this case, it is possible to terminate if method fails to cover at least one pose with addition of a new LED group. In this case, LED groups produced by the method for selecting LED groups to be used for solving correspondence problem (200) won't be able to produce valid correspondence results when used in the method for solving correspondence problem (100) for every pose in the training set. A different termination condition can be specified as selection of a predetermined number of LED groups. In this configuration, some poses won't be covered however the method for solving correspondence problem (100) will have fewer LED groups to work with.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2013/054769 | 6/11/2013 | WO | 00 |