The present invention generally relates to optical systems, more specifically to electro-optical systems that are used to determine the camera orientation and position (collectively known as pose) and capture 3D models relative to the photographed scene in order to extract correct dimensions and positions of physical objects from photographic images.
The task of capturing the 3D information in a scene consists of first acquiring a set of range measurements from the measurement device(s) to each point in the scene, then converting these device-centric range measurements into a set of point locations on a single common coordinate system often referred to as “world coordinates”. Methods to acquire the range measurements may rely heavily on hardware such as 2D time-of-flight laser rangefinder systems which directly measure the ranges to an array of points within the measurement field-of-view. Other systems exist that rely heavily on computing power to determine ranges from a sequence of images as a camera is moved around the object or scene of interest. These later systems are commonly called Structure From Motion systems or SFM. Hardware-intensive solutions have the disadvantages of being bulky and expensive. SFM systems have the disadvantage of requiring extensive computing resources or extended processing times in order to create the 3D representation, thus making them unsuitable for small mobile consumer devices such as smart phones.
Existing Structure from Motion (SFM) systems involve two computation paths, one to track the pose (orientation and position) of the camera as it captures a sequence of 2D images, the other to create a 3D map of the object or environment the camera is moving in or around. These two paths are interdependent in that it is difficult to track the motion (pose) of the camera without some knowledge of the 3D environment through which it is moving, and it is difficult to create a map of the environment from a series of moving camera images without some knowledge of the motion (pose) of the camera.
This invention introduces a method and system for capturing 3D objects and environments that is based on the SFM methodology, but with the addition of a simplified method to track the pose of the camera. This greatly reduces the computational burden and provides a 3D acquisition solution that is compatible with low-computing-power mobile devices. This invention provides a straightforward method to directly track the camera's motion (pose detection) thereby removing a substantial portion of the computing load needed to build the 3D model from a sequence of images.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings in which like reference numerals indicate like features and wherein:
Preferred embodiments of the present invention are illustrated in the FIGUREs, like numerals being used to refer to like and corresponding parts of the various drawings.
Before proceeding with the description it would be helpful to define a few terms for the purpose of this written specification. Some of these terms are used loosely from time to time and may mean different things. For example in the term “pose” of the camera is sometimes used to refer to the “orientation” of the camera independent of the “position” of the camera. In other cases “pose” is used to include both the orientation and the position of the camera. Sometimes the context makes it clear, sometimes it does not. In this specification the distinction of the two meanings is important so we provide clarifying definitions.
Glossary
3D Mapping or 3D Modeling
means a 3D model of an object or scene in world coordinates from which accurate measurements may be derived.
Orientation means the direction in which the central optical axis of the camera is pointed.
Pose means the orientation and position of the camera.
Position means the location of the camera relative to the fixed origin of the world coordinates used in the system.
Proportional 3D Model means a relational data set of the spatial relationship of key features of an object or scene where the scale or absolute size of the model is arbitrary and unspecified.
Range means distance from the point of observation to the point being observed.
SFM means Structure From Motion as further described below.
World Coordinates is a fixed coordinate system for the 3D Mapping which has an absolute position, orientation and scale relative to the scene or object being captured and from which physical measurements of constituent parts of the 3D scene can be extracted.
Structure From Motion (SFM) is a well known technique or category of techniques for determining the 3D mapping of a scene from a sequence of 2D images. Each point of the object or environment being mapped is captured in a minimum of two 2D images. SFM uses the principle of triangulation to determine the range to any point based on how the position of the point shifts in the 2D image from one camera position to another. It is necessary to accurately know the pose (position and angle) of the camera for each of the 2D images in order to be able to correctly calculate the range. Given a sufficient number of 2D views of the scene, existing SFM techniques can determine the pose of the camera in order to create a structural map of the object or scene. However, reliance on the image content alone for determining both the pose of the camera and then a structural model and or structural map is computationally intensive.
Existing SFM methods, whether feature-based or optical-flow-based, must determine corresponding pixels between each video frame in order to track the camera motion. For example, Klein and Murray (“KM”) (identified below) start by creating an initial 3D map based roughly on a stereo pair of images and then estimate the camera's pose relative to this map. For each new frame of video, KM first extrapolates from the camera's prior motion to estimate the camera's new pose. Based on that assumed pose, KM calculates where key scene features should have moved to in the new image, followed by detecting these features in the new image and adjusting the pose to match the image. KM runs the tracking and mapping operations in parallel extensively and intensively using a Graphical Processing Unit (“GPU”) and a Central Processing Unit (“CPU”).
The present 3D Mapping System uses an SFM engine but relieves the SFM engine from having to use this iterative technique for determining the pose of the camera for each image, thereby removing a substantial portion of the computing load needed to build the 3D model from a sequence of images and allowing for the potential for much more accurate results.
The improved mapping system can also be used to produce a non-contact measurement tool and profiler, a robotic vision module, an artificial vision system, and products to help the visually impaired. The products could be integrated as Apps and/or accessories with existing mobile devices such as smart phones, tablets, or notebook computers, or as a stand-alone product. In another embodiment, the digital camera could remain stationary while the scene being recorded is moved to expose views from multiple viewpoints.
The first embodiment of the 3D model capture device employs a smart phone with an auxiliary tool, an accessory to be attached to and used with the smart phone camera or integrated into a smart phone. The accessory contains a laser rangefinder and also uses the smart phone's inertial measurement unit (IMU) consisting of an accelerometer, gyroscope, and compass. In various versions, the IMU may be included in the accessory or it may be built into the smart phone. The laser rangefinder beam is aligned along or near the axis of the camera in a fixed and known position so that range measurements can be directly and accurately associated with the range from the camera to the rangefinder spot in the scene. The laser rangefinder provides an accurate range measurement with each image. Any laser distance measuring technology could be used including triangulation, phase-shift, time-of-flight, and interferometric. The essential characteristic of the distance measuring tool used in this embodiment is that the measurement locations must be visible in the images captured by the camera. In this way, the system knows precise location in the image to which the range measurement applies. Therefore, other distance measurement technologies, such as those based on LEDs or other light sources could also be used. The IMU is used to provide the pointing information of the camera as it is moved. Any type of inertial measurement system, containing at least a 3-axis gyroscope, could be used. The inertial motion and the distance measurements are combined in the Pose Engine to provide an accurate pose estimate.
Turning now to
The object to be mapped in the embodiment illustrated in
The embodiment illustrated in
Optical Flow is a technique used to track the movement of points or features in a scene from one image of the scene to another. Mathematically, this can be described as follows. Given a point [uX, uy] in image I1, find the point [uX+δx, uy+δy] in image I2 that minimizes the error ε in a neighborhood around the point, i.e., minimize
This technique was originally developed by Horn and Schunck as described in the following reference.
In the present invention, optical flow is one choice of techniques used to determine the motion of features in the scene from one image to the next in the series of images. The result of this optical flow computation is combined with the camera pose information obtained from the pose engine to calculate the distance to points in the scene based on SFM triangulation.
In the present embodiment discussed, a 3D modeling engine 158 converts the 3D mapping output from the SFM engine 156 in local camera coordinates into a 3D model in world coordinates. The 3D Modeling engine takes the 3D map points generated by the SFM engine in local camera coordinates and assembles the data from the complete sequence of 2D images into a single 3D model of the scene in world coordinates. It uses the ranges from the SFM Engine along with the known camera pose for each local range data set to map the points into the fixed world coordinates. This is done with a coordinate transformation whose transformation matrix values are determined from the pose and that maps the points from the local camera coordinates into world coordinates. The 3D Modeling Engine also may include data processing routines for rejecting outliers and filtering the data to find the model that best fits the collection of data points.
Finally the Modeling device contains a User Interface and Data Storage functions—that provides the user with choices as to how the 3D model and data is to be displayed and stored. The user can request a printer-ready 3D file, dimensional measurements extracted from the model, as well as other operations specific to the application. The 3D model is stored together with application specific data in a Quantified Image/Data File.
Inertial measurement units (IMU), which consist of gyroscopes, accelerometers, and optional compass are a multi-sensor unit that can provide rotational measurements along with instantaneous acceleration about multiple axes determined by the configuration of the unit. It is well known that position estimation based on the IMU is plagued with drift and inaccuracy. This is due to the fact that position is estimated by twice integrating the acceleration data. i.e.,
where x is position, v is velocity, and a is acceleration. Any noise at all in the acceleration data as well as any inaccuracy in estimating the gravity vector results in huge errors in the position estimate. (Note: The gravity vector must be accurately subtracted from the accelerometer signals so that just the motion acceleration data remains. This is complicated by the fact that the direction of gravity relative to the IMU device changes with the orientation of the IMU.) For this reason, estimating the camera position based on tracking the motion using the IMU data is inherently inaccurate. (Note: The technique described in [Tanskanen (2013)] uses an IMU alone to estimate the camera pose. This accounts for the significant difference between the ground truth pose and the estimated pose even after substantial iterative processing. The present embodiment avoids this error source by using a laser rangefinder in conjunction with the IMU.)
On the other hand, the IMU, with proper processing of the sensor data, can provide a very accurate and stable measurement of orientation (or pointing direction).
The laser rangefinder is designed to provide accurate distance measurements but provides no information about the direction of the laser beam. For example, a commercial laser measurement tool made by Bosch (model DLR130K) claims an accuracy of 1.5 mm at distances up to 40 m.
The IMU-engine-generated orientation data feeds into the pose engine together with information from the laser rangefinder to determine the pose of the camera including both the orientation and the position of the camera relative to the point illuminated by the laser for each image 194. In some embodiments the camera image data is employed by the pose engine to determine the range. Using the IMU data and range data, the pose engine is capable of setting the pose in world coordinates 194. The position portion of the pose in world coordinates is determined by adding to the laser rangefinder vector, an offset vector that connects the laser-illuminated point to the world coordinate origin or to another reference point in the scene whose spatial relationship to the world coordinate origin is known. Two axes of this offset vector are taken directly from the position of the world coordinate origin or reference point in each image relative to the laser illuminated point. The third axis is determined in the 3D modeling engine as the model is assembled from the multiple viewpoints by solving for the value that gives a resulting model with the best consistency from all viewpoints.
The laser range finder may emit its laser beam coincident with the axis of the camera, parallel to the axis of the camera with a known offset, or at a known offset and angle with the axis of the camera. The IMU, which is also affixed to the camera/laser rangefinder assembly, provides the orientation of the camera. The laser rangefinder provides the distance of the camera from a point in the scene that is illuminated by the laser spot. This directly gives the camera pose relative to the illuminated spot. This position vector of the laser-illuminated point is added to the offset vector as described previously to give the camera position in world coordinates. The 3D model of the scene is then assembled frame-by-frame using the triangulated SFM values, adjusting the third coordinate of the offset vector with a constraint that the points illuminated by the laser in each image map to points in the 3D model consistent with the camera pose.
Once information for at least two images has been gathered and determined by the pose engine, the camera image data and pose determination are shared with the SFM engine which builds a 3D mapping in local camera coordinates 196. And then the 3D modeling engine creates the model in world coordinates 198. As the process proceeds, the information captured and generated is stored and the user interface provides the user with access to the information and allows the user to initiate queries concerning real world measurements related to the object or scene 199.
The camera is employed to take a video or other sequence of images of the scene or object from a variety of camera poses (position(s) and orientations). In other words, the camera is moved around scanning the scene or object from many viewpoints. The objective of the movement is to capture all of the scene or object. It should be noted that the efficiency of creating the 3D model and accuracy of 3D model are dependent on the image sequence. Less data may result in greater efficiency but less accuracy. More data may be less efficient but provide more accuracy. After a certain amount of data, more data will result in diminishing returns in increased accuracy. Depending on the scene or object to be modeled, different movement paths will result in greater efficiency and accuracy. Generally, a greater number of camera images (video frames) from a wide range of camera poses, in particular a wide range of camera positions, should be used on areas of the object or scene where accuracy is of the greatest interest.
The origin of the world coordinate system is a specific point from which a range has been determined by the range finder in combination with the pose engine. Further range findings are always referenced to the single world coordinate origin.
The SFM engine automatically detects points of interest in the images of the object or scene and estimates the motion of the points of interest from one image to the next using optical flow or other known SFM techniques, thus building a structural mapping of the object or scene. For this reason, it has been found preferable to begin the camera scanning process in a manner that targets one of these expected points of interest with the laser rangefinder. The modeling engine then takes the structural model estimates from the SFM engine output and the pose (camera position and orientation) output of the pose engine to begin creating a 3D model of the object or scene in world coordinates. The 3D modeling engine weights the information and makes determinations as to the best fit of the available data. The 3D modeling engine also monitors the progression of changes to the model as new information is evaluated. Based on this progression, the 3D modeling engine can estimate that certain accuracy thresholds have been met and that continued processing or gathering of data would have diminishing returns. It may then, depending on the application, notify the user that the user selected accuracy threshold has been achieved. The resultant 3D model is saved and the user is provided with a display of the results and provided with a user interface that allows the user to request and receive specific measurement data regarding the modeled object or scene.
The second embodiment of the auxiliary tool is also an accessory to be attached to and used with the camera, or may be integrated into the camera device. In this embodiment, the accessory also contains a laser rangefinder device. The laser rangefinder beam is aligned along or near the axis of the camera in a fixed and known position so that range measurements can be directly and accurately associated with the range from the camera to the rangefinder spot in the scene. The same considerations for the distance measurement technology that were stated previously for the first embodiment also apply to this embodiment. Again, the essential characteristic of the distance measuring tool used in this embodiment is that the measurement locations must be visible in the images captured by the camera. In this way, the system knows the precise location in the image to which the range measurement applies. In this embodiment, it is not necessary that a range measurement be made for each image in the image sequence. It is sufficient that as few as one range measurement is made for the complete sequence of images.
While the disclosure has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments may be devised which do not depart from the scope of the disclosure as disclosed herein. The disclosure has been described in detail, it should be understood that various changes, substitutions and alterations could be made hereto without departing from the spirit and scope of the disclosure.
This application is a utility application claiming priority of U.S. provisional application(s) Ser. No. 61/732,636 filed on 3 Dec. 2012; Ser. No. 61/862,803 filed 6 Aug. 2013; and Ser. No. 61/903,177 filed 12 Nov. 2013 Ser. No. 61/948,401 filed on 5 Mar. 2014; U.S. Utility application Ser. No. 13/861,534 filed on 12 Apr. 2013; and Ser. No. 13/861,685 filed on 12 Apr. 2013; and Ser. No. 14/308,874 filed Jun. 19, 2014 and Ser. No. 14/452,937 filed 6 Aug. 2013; Ser. No. 14/539,924 filed 12 Nov. 2004.