The present application generally relates to electronic object recognition, and more particularly to improved techniques for constructing models of three dimensional objects and tracking three dimensional objects based on the models.
Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics, and the like using a variety of techniques. One problem that arises in connection with AR functionality is that it may be difficult to orient the camera such that augmented content, such as overlaid graphics, properly align with a scene within the field of view of a camera. Marker-based AR techniques have been developed in an attempt to overcome this problem. In marker-based AR, an application is configured to recognize markers present in a real-world environment, which helps orient and align a camera. Markers may be two dimensional, such as a barcode or other graphic, or may be three dimensional, such as a physical object.
Regardless of whether the marker-based AR application utilizes two dimensional markers, three dimensional markers, or both, the application must be programmed to recognize the markers. Typically, this is accomplished by providing the application with a model of the marker. Generation of models for two dimensional markers is simpler than three dimensional markers. For example, three dimensional markers often require use of specialized software (e.g., three dimensional modelling software) or three dimensional scanners to generate three dimensional models of the object. The process of generating three dimensional models for use in marker-based AR systems is a time consuming process and requires significant amounts of resources (e.g., time, cost, computing, etc.) if a large library of markers are to be used.
The present disclosure describes systems, methods, and computer-readable storage media for constructing and using a model for tracking a three dimensional object. In embodiments, a model of a three dimensional object may be constructed using a plurality of two dimensional images. The images of the three dimensional object used to construct the model may be captured by a monocular camera from a plurality of positions. The three dimensional object may be resting on a pedestal as the images are captured by the camera, and a coordinate system may be defined with respect to the pedestal. The pedestal may include a plurality of markers, and the coordinate system may be defined based at least in part on the plurality of markers. The coordinate system may allow the model be used to determine a position of the camera relative to the three dimensional object based on a captured image. In embodiments, the model may comprise information associated with one or more features of the three dimensional object, such as information associated with features points identified from images containing the three dimensional object. For a particular image, the features of the three dimensional object may be identified through image processing techniques.
In addition to identifying features or feature points of the object, the image processing techniques may analyze the images to identify any markers of the pedestal that are present within the image(s). The markers may be used to provide camera position information that indicates the position of the camera when the image was captured. The camera position information may be stored in association with the corresponding features. In this manner, the position of a camera may be determined by first matching features or feature points identified in an image of the three dimensional object to features or feature points of the model, and then mapping the features points to the corresponding camera position determined during construction of the model. This may enable the model to provide information descriptive of a camera position relative to the three dimensional object based on an image of the three dimensional object when it is not resting on the pedestal.
In embodiments, the model may be configured to enable tracking of the three dimensional object using a monocular camera. The coordinate system may enable the model to provide information associated with the camera position relative to the three dimensional object during tracking. During tracking operations, an image or stream of images may be received from a camera. The image or stream of images may be analyzed to identify features present in the image(s). The features may then be compared to the features of the model to determine whether the three dimensional object corresponding to the model is present in the image(s). If the three dimensional object is determined to be present in the image(s), the position of the camera relative to the object may be determined based on the model (e.g., by matching the features determined from the image to features included in the model an then mapping the features to a camera position based on the model). In embodiments, the position of the camera relative to the three dimensional object may allow an AR application to direct a user regarding how to position the camera into a target camera position, such as a position suitable for performing AR operations (e.g., overlaying one or more graphics on the image in a proper alignment with respect to a scene depicted in the image).
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter which form the subject of the claims. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present application. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the application as set forth in the appended claims. The novel features which are believed to be characteristic of embodiments described herein, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present embodiments.
For a more complete understanding, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Referring to
In embodiments, a plurality of markers may be present on the pedestal. For example, and referring to
In embodiments, the markers 132, 142, 152, 162 may comprise two dimensional markers. For example, the markers 132, 142, 152, 162 may comprise barcodes, sequences of alphanumeric characters, dots, colors, patterns of colors, or other marks that may be recognized through image analysis. In embodiments, each of the markers 132, 142, 152, 162 may uniquely correspond to, and identify, a particular one of the sides of the pedestal 100. It is noted that in some embodiments, one or more markers may be placed on top surface 110 of the pedestal 100. The one or more markers placed on the top surface of the pedestal 100, when detectable in the image(s) of the three dimensional object may indicate that the object is being imaged from a higher angle relative to the pedestal 100 than when the detectable images are not present in the image(s). For example, if all of the markers present on the top of the pedestal are detected in an image of the three dimensional object, this may indicate that the image was captured in an orientation where the camera was looking down on the object (e.g., from substantially directly above the pedestal). As another example, when one or more of the markers placed on the top surface 110 of the pedestal 100 and one or more of the markers 132, 142, 152, 162 are detected in an image of the three dimensional object, this may indicate that the image was captured in an orientation where the camera was looking down on the object (e.g., from above the pedestal 100 at an angle), but not looking directly down on the three dimensional object and the top surface 110 of the pedestal 100. When the one or more markers are placed on the top of the pedestal 100, they may be arranged such that they are at least partially unobstructed by the object.
Referring to
To provide the position information, a coordinate system may be defined relative to the pedestal 100. For example, and referring to
The coordinate system may enable the model to provide directional information for orienting a camera into a target orientation (e.g., an orientation in which a graphical overlay or other AR functionality is properly aligned with the environment depicted in the image of the object). For example, assume that a front portion of the three dimensional object faces in the direction of the positive Y-axis. Now assume that the target orientation of the camera indicates that the image of the environment or three dimensional object should be captured from the left side of the three dimensional object (e.g., the side of the object along the negative X-axis). If an image of the three dimensional object received from a camera is analyzed and it is determined that the camera is oriented towards the front of the three dimensional object (e.g., the camera is oriented along the Y-axis and is viewing the object in the direction of the negative Y-axis), the model may be used to determine that, in order to properly orient the camera to view the three dimensional object from the left side, the camera needs to be moved in a negative direction along both the X-axis and the Y-axis while maintaining the camera pointed towards the three dimensional object. It is noted that in embodiments, feature matching techniques may be implemented by the AR application to identify the three dimensional object, such as to identify which model corresponds to the three dimensional object, and to track the three dimensional object as the camera is moved, as described in more detail below.
Referring to
In embodiments, the plurality of captured images may be captured at different points surrounding the three dimensional object 200. For example, in
In embodiments, each of the plurality of images may be captured at substantially the same angle. In embodiments, the plurality of images of the three dimensional object 200 may be captured at a plurality of angles. For example, and referring to
Capturing the plurality of images of the three dimensional object from different angles may improve the capabilities of the model. For example, as briefly described above, the model may be utilized during tracking of the three dimensional object 200. During tracking of the three dimensional object 200 using the model, an image or stream of images depicting the three dimensional object 200 may be analyzed to identify features of the three dimensional object 200 from within the image or stream of images. The model may be utilized to identify information, within the model, corresponding to the identified features, and then provide information associated with an orientation of the camera based on the model, such as based on the coordinate system 400 and/or based on other information included in the model, as described in more detail below. Thus, acquiring images from different angles during construction of the model may enable the features of the three dimensional object 200 to be identified more easily (e.g., because there are more angles in which the features of the three dimensional object can be identified).
In embodiments, the plurality of images may include at least 100 images of the three dimensional object 200 while it is placed on the pedestal 100. In additional or alternative embodiments, the plurality of images may include more than 100 images or less than 100 images of the three dimensional object 200 while it is placed on the pedestal 100. The particular number of images included in the plurality of images may be based on a number of features or feature points that are identifiable for the three dimensional object, a strength of the identifiable features or feature points that are identifiable for the three dimensional object, a size of the three dimensional object, a complexity of patterns identifiable for the three dimensional object, other factors, or a combination thereof, as described in more detail below.
Referring to
Referring to
As shown in
The memory 820 may store instructions 822 that, when executed by the one or more processors 812, cause the one or more processors to perform operations for generating models of three dimensional objects in accordance with embodiments. Additionally, in embodiments, the memory 820 may store a database 824 ne or more models. Each of the models included in the database 824 may correspond to a model constructed in accordance with embodiments. For example, each of the different models may be constructed by placing a different three dimensional object on the pedestal 100, and then capturing a plurality of images of the three dimensional objects while placed on the pedestal, as described above with reference to
During processing of the images, each of the images may be analyzed to determine camera position information and features points. The camera position information may be determined by identifying one or more of the markers 132, 142, 152, 162 of the pedestal 100 that are present within each of the images. The feature points identified in an image may be stored in correspondence with the camera position information such that identification of a set of feature points from an image of the three dimensional object while the object is not on the pedestal 100 can be matched to a set of feature points of the model and then mapped to a camera position corresponding to the matched set of feature points, thereby enabling a camera position relative to the object to be determined based on images of the three dimensional object without requiring the three dimensional object to be placed on the pedestal.
For example, and briefly referring to
Referring back to
As described above, a model of a three dimensional object may be constructed. The electronic device 830 may comprise an application, which may be stored at the memory 840 as instructions 842 that, when executed by the one or more processors 832, cause the one or more processors 832 to perform operations for tracking a three dimensional object using models constructed by the model generation device 830 according to embodiments. Additionally, the operations may further include determining a position of the camera 860 relative to the three dimensional object based on the models constructed by the model generation device 830 according to embodiments and based on one or more images captured by the camera 860 while the three dimensional object is not placed on the camera.
For example, and referring back to
As shown above, the system 800 provides for generation of models of three dimensional objects based on a plurality of images of the three dimensional object captured using a monocular camera. Thus, the system 800 of embodiments enables models to be constructed for three dimensional markers or objects without the need to utilize specialized three dimensional modelling software or a three dimensional scanner. Additionally, the models constructed by the system 800 enable tracking of a three dimensional object and are configured to provide camera position information when the three dimensional object is not placed on the pedestal, which may be particularly useful for many AR applications.
Referring to
As shown in
In
In embodiments, the directions and rotations indicated by the outputs 1012, 1014, 1016 may be determined using the local coordinate system of the model. For example, the features 1024 and 1026, when first identified in the image of
As shown above, models constructed according to embodiments may enable tracking of a three dimensional object based on images captured using a monocular camera. Additionally, models constructed according to embodiments enable camera position information to be determined relative to a three dimensional object based on feature points identified within an image, which may simplify implementation of various AR functionality.
Further, models constructed according to embodiments may be smaller in size than those constructed with three dimensional scanners and/or three dimensional modelling software. Typically, a three dimensional model comprises a very dense point cloud that contains hundreds of thousand points. This is because in a three dimensional model constructed using a three dimensional scanner and/or three dimensional modelling software, the three dimensional object is treated as if it is made of countless points on its body (e.g., the three dimensional model utilizes every part of the surface of the object). In contrast, models constructed according to embodiments are only interested in certain points on the object's body, namely, feature points (e.g., information comprising distinguishing features or aspects of the object's body). For example, in referring back to
By including only information associated with distinct features of the three dimensional object, and excluding information that does not facilitate identification of the three dimensional object, such as information associated with smooth and texture-less portions of the object's body, models constructed according to embodiments contain less points than three dimensional models generated using three dimensional scanners and/or three dimensional modelling software, and result in models that are smaller size. This allows the models constructed according to embodiments to be stored using a smaller amount of memory than would otherwise be required (e.g., using traditional models constructed using three dimensional scanners and/or three dimensional modelling software), which may enable a device to store a larger library of three dimensional object models while utilizing less memory capacity. This may also facilitate identification of a larger number of three dimensional objects by an AR application configured to use the library of models. Additionally, by only including information in the model associated with distinct and/or identifiable features, a model constructed according to embodiments may facilitate faster identification and tracking of the three dimensional object in a real-time environment. For example, when matching a live camera-fed image with a template image or information stored in a model constructed according to embodiments, the matching process will be faster because it compares much fewer points than three dimensional models constructed using a three dimensional scanner and/or three dimensional modelling software. Further, it is noted that accuracy of the tracking and/or three dimensional object identification is not compromised because the information stored in the model comprises the most distinct features of the three dimensional object.
Referring to
At 1110, the method 1100 includes receiving, by the processor, a plurality of images. The plurality of images may correspond to images of a three dimensional object (e.g., the three dimensional object 200) while the three dimensional object is placed on a pedestal (e.g., the pedestal 100), as described above with reference to
The method 1100 also includes, at 1120, defining, by the processor, a coordinate system with respect to the pedestal upon which the three dimensional object is placed. In embodiments, the coordinate system (e.g., the coordinate system 400 of
In embodiments, identifiable portions of the pedestal may be assigned positions within the coordinate system. For example, and referring back to
As described above, during model construction, a plurality of images may be captured while the three dimensional object is situated on the pedestal 100. During the capturing of the plurality of images, the three dimensional target object may be fixated on the pedestal 100 so that it remains static relative to the pedestal 100, and thus static relative the coordinate system 400. In each of the plurality of images, at least one of the markers (e.g., at least one of the markers 132, 142, 152, 162 of
The camera pose may be estimated by minimizing the reprojection error of the marker corners. For example, let x_i, p_i be the three dimensional coordinates in the coordinate system 400, denoted as a coordinate system “C” below, and subpixel positions in the picture of a marker corner, respectively. proj(x_i)=(u, v) may be given by:
By including the markers of the pedestal 100 in the images captured of the three dimensional object, the camera pose for each picture may be determined. From Equations 1-6 above, triangulation may be used to identify corresponding points on different images, enabling the three dimensional coordinates of the feature points on the three dimensional object's surface to be determined, which may facilitate identification of relationships between different ones of the plurality of images (e.g., identification of a particular image as being captured from a particular direction relative to one or more other images of the plurality of images).
Defining a local coordinate system with respect to the pedestal, and then capturing images of the three dimensional object while placed on the pedestal may enable spatial relationships between the camera and the three dimensional object may be determined from the model using image analysis techniques, as described above. As a further example, when a particular marker of the pedestal is present in the image of the three dimensional object, it may be determined that the object is being viewed by the camera from a particular direction within the coordinate system. Storing the camera position information in correspondence with features or features points identified in a particular image enables a camera position to be determined in relation to the three dimensional object within the coordinate system when the pedestal is not present, as described above.
At 1130, the method 1100 includes constructing, by the processor, the model for tracking the three dimensional object. In embodiments, the model may be constructed based on the plurality of images of the object that were captured while the object is placed on the pedestal and based on the coordinate system. The model for tracking the three dimensional object may be configured to provide information representative of a position of a camera (e.g., the camera 860 of
In embodiments, the model may be stored in a library of models comprising a plurality of models (e.g., the library of models 824 of
In embodiments, constructing the model, at 1130, may include analyzing each of the plurality of images to identify features of the three dimensional object within each image. The features may comprise: lines, shapes, patterns, colors, textures, edge features (e.g., boundary/edge between two regions of an image, such as between the object and the background), corner features (e.g., perform edge detection and then analyze the detected edges to find rapid changes in direction, which may indicate corners), blob features (e.g., features that are focused on regions of the three dimensional object, as opposed to corners which focus more on individual points), other types of features, or a combination thereof. For detecting each type of feature, there are many implementations. In our case, we mainly use a corner detector as it is fast, accurate and suitable for a real-time environment. But whatever feature detector is being used, the same idea can be applied to construct the model.
In embodiments, the pedestal used to capture the plurality of images may be disposed in a room or chamber having a particular color of walls or some other characteristic that simplifies identification of the features of the three dimensional object (e.g., enables the image analysis algorithm to distinguish between the three dimensional object on the pedestal and background information). In embodiments, a strength of the identifiable features of the object may be determined. Strong features may correspond to features that may be easily identified, or that may be consistently identified repeatedly using image analysis techniques. For example, lines may have a strong contrast with respect to the three dimensional object, such as the textures on the cup of
In embodiments, the markers on the pedestal may be analyzed to determine relationships between different ones of the plurality of images. For example, during the image analysis, markers present on the pedestal may be identified. As described above, the markers on the pedestal provide information that may provide an indication of the position of the camera. For example, and referring back to
As explained below, during tracking of the three dimensional object, one or more additional images of the three dimensional object may be captured and analyzed to identify feature points in the one or more additional images, and the feature points identified within the one or more additional images may be compared to the model to determine corresponding feature points of the model. After the corresponding feature points of the model are identified, the model may be used to determine the position of the camera used to capture the one or more additional images. The camera position corresponding to the corresponding feature points of the model may indicate the position of the camera used to capture the one or more additional images. For example, by comparing the feature points identified in a particular one of the one or more additional images with the feature points included in the model, it may be determined that the particular image was captured from a camera position that corresponds to a camera position used to capture one of the plurality of images used to construct the model.
The method 1100 provides several advantages over existing techniques for generating models suitable for use in AR applications. For example, the method 1100 enable models of three dimensional objects to be constructed using only two dimensional images, such as images captured using a monocular camera. This eliminates the need to utilize special software, such as three dimensional modelling software, to generate the models, and may enable the models to be constructed more easily. Apart from utilizing specialized software, other three dimensional modelling techniques require images to contain depth information, which can be obtained from some kinds of specialized tools, such as three dimensional scanners, or using two monocular cameras. The former is not commonly available, and the latter requires two individual cameras working together, increasing the complexity of the modelling process. As explained above, embodiments of the present disclosure enable model construction using a single monocular camera, such as may be commonly found on a mobile phones or tablet computing devices. Thus, embodiments enable three dimensional models to be constructed without the cost of specialized tools, such as three dimensional scanners or modelling software, and without requiring coordination of multiple cameras or other devices.
Referring to
At 1210, the method 1200 includes storing a model of an object. In embodiments, the model may be constructed according to the method 1100, as described above, and may be configured to enable the application to track the three dimensional object. In embodiments, the application may be configured to utilize a library comprising a plurality of models (e.g., the library of models 844 of
At 1220, the method 1200 includes receiving an image of the object from a camera of the electronic device. In embodiments, the image may be received as a single image. In additional or alternative embodiments, the image may be received as a part of a stream of images. For example, the camera may be operated in a video mode and the image may be received as part of a stream of images corresponding to video content captured by the camera.
At 1230, the method 1200 includes determining a position of the camera relative to the three dimensional object based on the model. For example, the camera position relative to the three dimensional object may be determined based on the model using the position information defined by the model. In embodiments, the camera position may be determined by correlating feature points of the three dimensional object identified within the image captured by the camera to feature point information defined within in the model, where the feature points defined within the model are mapped to camera position information derived during construction of the model, as described above.
At 1240, the method 1200 may include performing one or more AR operations based on the position of the camera relative to the three dimensional object. For example, in embodiments, the AR operations may include providing a graphical overlay that appears within the scene depicted by the image from which the position and/or orientation of the camera was determined. To ensure that the graphical overlay is properly aligned within the scene, the three dimensional object may be used as a three dimensional marker. The proper alignment may be achieved by placing the camera in a target camera position. In embodiments, the target camera position for the camera relative to the object may be determined based on information defined within the model and/or information associated with the particular graphical overlay to be applied (e.g., different graphical overlays may have different target camera positions with respect to the three dimensional object), as described above with respect to
In embodiments, the method 1200 may further include determining a quality metric representative of a strength of the correlation of the image of the object to one of the plurality of images of the object used to construct the model, and determining whether the quality metric satisfies a tracking threshold. The graphical overlay may be provided, based at least in part, on a determination that the quality metric satisfies the tracking threshold. For example, a determination that the quality metric does not satisfy the threshold may indicate that the object is not being tracked by the camera. In such case, the AR operation may not be performed. Utilizing the quality metric may assist with identifying the three dimensional object when a portion of the three dimensional object is not visible within the image. For example, a particular set of features points may provide a strong indication that the three dimensional object is present within an image while another set of feature points may provide a weak indication that the three dimensional object is present within the image. When the particular set of strong feature points is identified within the image, the three dimensional object may be identified as present within the image even when the another set of weak feature points are not identified in the image.
As shown above, the method 1200 may enable a camera position relative to a three dimensional object to be determined from a model constructed from a plurality of images of the three dimensional object while the object is positioned on a pedestal. Additionally, the method 1200 facilitates tracking of a three dimensional object using a model constructed from a plurality of images of the three dimensional object while the object is positioned on a pedestal. AR applications and functionality are increasingly seeking to operate in a real-time environment in which the camera (usually handheld device) is constantly moving. As described above, the methods of embodiments for tracking position and orientation of a camera and identification of a three dimensional object using models constructed according to embodiments may be relatively faster than other techniques. This is because models constructed according to embodiments are constructed with a relatively sparse points-based model of the three dimensional object with only the feature points (e.g., the identified distinct or identifying features), whereas other three dimensional models (e.g., models constructed using three dimensional scanners and/or modelling software) comprise dense models that include large point clouds that include points corresponding to non-distinct and non-distinguishing features of the three dimensional object. This enables methods for tracking of camera position/orientation and identification of three dimensional objects according to embodiments to be performed faster since there are less points in the model to compare to points identified in real-time images received from a camera.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification.