Embodiments of the invention are in the field of deriving 3D measurements of objects from 2D photos using mobile computing devices, such as smartphones or tablet devices.
The statements in the background of the invention are provided to assist with understanding the invention and its applications and uses, and may not constitute prior art.
Photogrammetry relates to obtaining three-dimensional (3D) information about 3D objects through recording and synthesizing data from two-dimensional (2D) images. In some implementations, photogrammetry enables obtaining 3D information from 2D photographs or images. Photogrammetry makes use of methodologies drawn from different disciplines, such as optics, projective geometry, and so on, in order to synthesize 3D information from 2D images.
However, photogrammetry requires a known scale factor or other dimensionality data in order to generate an accurate 3D model with proper dimensional information. Therefore, it would be an advancement in the state of the art to determine a scale factor of 2D photos, and use the scale factor during the photogrammetry process to generate a 3D model of the object. Furthermore, it would be a further advancement in the state of the art to allow mobile computing devices to obtain 3D measurements of 3D objects from 2D photos.
It is against this background that the present invention was developed.
This summary of the invention provides a broad overview of the invention, its application, and uses, and is not intended to limit the scope of the present invention, which will be apparent from the detailed description when read in conjunction with the drawings.
Photogrammetry enables obtaining measurements for 3D objects from 2D images. A scale factor is the ratio of the size or distance of a feature on an image to its actual real-world size. If the scale factor of an image is known, the actual real-world measurements, distances, or lengths of objects can be calculated by measuring the corresponding distances on the photo in pixel dimensions and multiplying them by the scale factor. As object detection and recognition technologies improve, the mobile computing devices can be used for obtaining measurements of different objects, provided a physical scale exists to serve as a size reference. Generally, reference objects such as A4 size sheets or credit cards, which have known sizes with standardized dimensions, are employed for obtaining the scale factor. However, having to ensure the availability of such objects at the time of measurement increases user resistance to employing the mobile computing devices for measurements. Therefore, obtaining the scale factor without the use of reference objects of known size would be highly desirable from the user's perspective and the corresponding adoption of the user application.
A scale factor scales the dimensions of a 2D image of a target object from pixel dimensions to real world dimensions. There exists a distinct scale factor for each of the distinct 2D images of the target object captured by a device. The terms scale reference and scale information refer to one or more such scale factors enabling the generation of a scaled 3D model.
Various methods and algorithms are within the scope of the present invention for determining a scale reference. In one embodiment, a software library on the mobile computing device is used to detect at least two features points of a ground plane on which the target object is placed. The feature points thus detected are used to determine a scale reference, as detailed below. In another embodiment, a depth sensor is used to determine the scale reference. In yet another embodiment, a lidar sensor (i.e., a laser detection, imaging, and ranging device) is used to determine the scale reference. A depth sensor or lidar may be used to generate depth information (e.g., how far the target object in the 2D image is from the camera) from one or more captured 2D images. Depth information may in turn be used to scale the pixel dimensions of the target object to its real-world dimensions.
Measuring regular objects is easier than measuring irregular objects. Even when a person is physically in possession of the irregular object, obtaining the measurements of irregular objects can require multiple measurements from different angles and usage of different types of measuring tools. Systems and methods are disclosed herein for obtaining measurements of regular and irregular objects using mobile computing devices.
In one embodiment, the mobile computing device can download and execute a measurement application for measuring the object. In an embodiment of the invention, the measurement application is an Augmented Reality (AR) guided scanning application. The mobile computing device is placed at a predetermined position relative to the object so that the camera on board the mobile application is focused on the object. The measurement application is opened and a user operating the mobile computing device is requested to log in. Upon logging in, the user can either select to measure an object or retrieve previously uploaded images. If the user selects to measure an object, the user is provided with object measurement instructions based on the user selection. In order to be measured accurately, the object is imaged from different sides and different angles based on the instructions or directions provided by the measurement application, with the object maintained in the correct position based on the AR guides. Different sets of instructions can be provided for different objects.
In some embodiments, a depth sensor or a lidar sensor is used to determine the scale reference. In another embodiment, a measurement of a horizontal reference plane or a ground plane is initially obtained using an augmented reality software development kit (AR-SDK), or an equivalent library or subroutine on the mobile computing device. For example, an AR-SDK developed for the Apple iPhone and described at http://developer.apple.com/documentation/arkit may be used. Other AR-SKDs may also be used, such as the Android AR-SDK described at https://outsourceit.today/ar-sdk-ios-android-development/. A computer vision software development kit (CV-SDK), or equivalent library or subroutine on the mobile computing device, is used to identify the object to be measured. For example, an CV-SDK developed for the Apple iPhone and described at: https://developer.apple.com/documentation/vision may be used. Other AR-SDKs are further discussed below, in the context of
Based on the identified object, the measurement instructions initially generate a first position guide to capture a first image of the object in a first position. The first position guide can guide the user regarding precise placement of the camera and mobile computing device relative to the object in the first position for capturing the first image. Similarly, a series of images capturing the object from different sides and different angles are obtained by placing the mobile computing device in a series of positions relative to the object, which is generally maintained in the same fixed location. In an example, at least 2 (preferably at least 3) images of the object are captured from at least 2 (or at least 3) different directions and at least 2 (or at least 3) different positions. Based on the ground plane measurement (e.g., feature points) and the different positions, 3D dimensions can be reconstructed from the series of 2D images without the need for a reference object. More particularly, the distance information of at least 2 (or at least 3) feature points in the AR-SDK enables generation of a scale factor, which can be used to scale a 3D model. In an example, scale information can be obtained from the series of 2D images using geometric analysis running in program code. In another example, deep learning networks that are trained on images of similar objects with explicitly labelled scale data can be used to extract scale information from the 2D images and reconstruct a 3D model of the object.
There exists a distinct scale factor scaling pixel dimensions to real-world dimensions for each of the distinct captured 2D images. The terms scale reference and scale information refer to one or more such scale factors enabling the generation of a scaled 3D model.
In one embodiment, a computer-implemented method for obtaining measurements of an object is disclosed, the method executable by a processor, the method comprising generating a plurality of image capture screens for display on a mobile computing device, each image capture screen providing instructions for placing the mobile computing device in corresponding image capture positions for measurement of the object, wherein each of the plurality of image capture positions are within a predetermined angular distance around the object; capturing a plurality of images of the object corresponding to the plurality of image capture screens; determining one or more scale factors for the plurality of images of the object; generating a 3D model of the object from the at least two different images of the plurality of images and their corresponding scale factors, wherein a given scale factor scales pixel dimensions in a given image to real-world dimensions of the object; and generating one or more object measurements from the 3D model.
In one embodiment, determining one or more scale factors for the plurality of images of the object comprises detecting two or more feature points on a ground plane on at least two different images of the plurality of images, and determining a scale factor for each of the at least two different images from the two or more feature points.
In one embodiment, the one or more scale factors are determined using a lidar sensor.
In another embodiment, the one or more scale factors are determined using a depth sensor.
In yet another embodiment, the processor employs an augmented reality software development kit (AR-SDK) for detecting the ground plane.
In one embodiment, the one or more scale factors are calculated based on distances between at least three feature points of the ground plane.
In one embodiment, generating the 3D model of the object further comprises utilizing a 2D keypoint Deep Learning Network (DLN).
In another embodiment, generating the 3D model of the object further comprises utilizing a 3D keypoint Deep Learning Network (DLN).
In yet another embodiment, the 3D model is generated using a retopology process.
In one embodiment, one or more of the image capture screens enables a user employing the mobile computing device to set a relative position between the object and the mobile computing device into one of the image capture positions.
In one embodiment, the image capture screens for setting the image capture positions comprise a position guide for positioning the object.
In another embodiment, one or more of the plurality of image capture screens enable determining an angle of the mobile computing device relative to the object in each of the image capture positions.
In one embodiment, one or more of the plurality of image capture screens enable determining whether there is movement of the mobile computing device during an image capture operation.
In yet another embodiment, the mobile computing device generates feedback when the mobile computing device is in a correct imaging position.
In one embodiment, the object is a body part.
In another embodiment, the body part is a human limb.
In another embodiment, the body part is selected from the group comprising a human foot and a human hand.
In one embodiment, the computer-implemented method further comprises receiving a selection of an object type to be measured from a plurality of object types, wherein the image capture screens are generated based on the selected object type.
In various embodiments, a computer program product is disclosed. The computer program may be used for obtaining scale reference and measurements of a three-dimensional (3D) object from a series of 2D images of the 3D object, and may include a computer-readable storage medium having program instructions, or program code, embodied therewith, the program instructions executable by a processor to cause the processor to perform steps to the aforementioned steps.
In various embodiments, a system is described, including a memory that stores computer-executable components, and a hardware processor, operably coupled to the memory, and that executes the computer-executable components stored in the memory, wherein the computer-executable components may include components communicatively coupled with the processor that execute the aforementioned steps.
In another embodiment, the present invention is a non-transitory, computer-readable storage medium storing executable instructions, which when executed by a processor, causes the processor to perform a process for generating scale references of 3D objects, the instructions causing the processor to perform the aforementioned steps.
in another embodiment, the present invention is a system for generation of scale references and size measurement of 3D objects using a 2D phone camera, the system comprising a user device having a 2D camera, a processor, a display, a first memory; a server comprising a second memory and a data repository; a telecommunications-link between said user device and said server; and a plurality of computer codes embodied on said first and second memory of said user-device and said server, said plurality of computer codes which when executed causes said server and said user-device to execute a process comprising the aforementioned steps.
In yet another embodiment, the present invention is a computerized server comprising at least one processor, memory, and a plurality of computer codes embodied on said memory, said plurality of computer codes which when executed causes said processor to execute a process comprising the aforementioned steps. Other aspects and embodiments of the present invention include the methods, processes, and algorithms comprising the steps described herein, and also include the processes and modes of operation of the systems and servers described herein.
Yet other aspects and embodiments of the present invention will become apparent from the detailed description of the invention when read in conjunction with the attached drawings.
Embodiments of the present invention described herein are exemplary, and not restrictive. Embodiments will now be described, by way of examples, with reference to the accompanying drawings, in which:
With reference to the figures provided, embodiments of the present invention are now described in detail.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures, devices, activities, and methods are shown using schematics, use cases, and/or flow diagrams in order to avoid obscuring the invention. Although the following description contains many specifics for the purposes of illustration, anyone skilled in the art will appreciate that many variations and/or alterations to suggested details are within the scope of the present invention. Similarly, although many of the features of the present invention are described in terms of each other, or in conjunction with each other, one skilled in the art will appreciate that many of these features can be provided independently of other features. Accordingly, this description of the invention is set forth without any loss of generality to, and without imposing limitations upon, the invention.
This application is related to PCT application No. PCT/US20/70465, filed on 27 Aug. 2020, and entitled “METHODS AND SYSTEMS FOR PREDICTING PRESSURE MAPS OF 3D OBJECTS FROM 2D PHOTOS USING DEEP LEARNING,” the entire disclosure of which is hereby incorporated by reference in its entirety herein.
Obtaining a Scale Factor and Measurements of 3D Objects from 2D Images
In
In some embodiments, a depth sensor or a lidar sensor is used to determine the scale reference. In another embodiment, a ground plane on which the object is placed is initially scanned by the AR-guided application 104. Subsequent image capture screens assist the user in positioning the mobile computing device relative to the object in a plurality of imaging positions to capture a series of 2D images of the target object 106. In one embodiment, the ground plane is represented as three or more feature points in the captured 2D images 106. In one embodiment, the AR-guided application uses a software library (e.g., AR-SDK on iPhone or Android) to determine the real-world coordinates of two or more of the detected ground plane feature points, as discussed above. The AR-SDK provides both the image coordinates and the corresponding real-world coordinates to the processor. From the image coordinates and the corresponding real-world coordinates of at least two feature points on the ground plane, the processor calculates the corresponding scale factor for each image. The captured 2D images 106 are thus used to determine one or more scale factors 110 (i.e., a scale reference) scaling the pixel dimensions of the target object to its real-world dimensions. In one embodiment, a scale reference can be obtained from the series of 2D images using geometric analysis running in program code.
A 3D model generation module 120 uses the one or more determined scale factors 110 and the 2D images of the target object 106 to build a scaled 3D model 130 of the target object in virtual 3D space. A scale factor 110 is required to scale from pixel dimensions in the 2D images to real-world dimensions. In particular, the scale factor 110 allows the scaling of the meshes within the 3D model generation module 120 to real-world dimensions. The scaled 3D model 130 is subsequently used to generate one or more measurements 140 of the target object.
Importantly, scale reference 110 determination and 3D model 130 building may be implemented on the mobile computing device or in the cloud (e.g., on a remote server), as mentioned in the context of
In the embodiment of
In some embodiments, a depth sensor or a lidar sensor is used to determine the scale reference. In another embodiment, a ground plane on which the object is placed is initially scanned by the AR-guided application 204. Subsequent image capture screens assist the user in positioning the mobile computing device relative to the object in a plurality of imaging positions to capture a series of 2D images of the target object 206. In one embodiment, the ground plane is represented as three or more feature points in the captured 2D images 206. In one embodiment, the AR-guided application uses a software library to determine the real-world coordinates of two or more of the detected ground plane feature points, as discussed above. The captured 2D images 206 are thus used to determine one or more scale factors 210 (i.e., a scale reference) scaling the pixel dimensions of the target object to its real-world dimensions.
The 3D model generation module 220 uses the one or more determined scale factors 210 and the 2D images of the target object 206 to build a scaled and structured 3D model 230 of the target object in virtual 3D space. A scale factor 210 is required to scale from pixel dimensions in the 2D images to real-world dimensions. In particular, the scale factor 210 allows the scaling of the meshes within the 3D model generation module 220 to real-world dimensions.
Structured and unstructured meshes differ by their connectivity. An unstructured mesh has irregular connectivity between vertices, requiring the explicit listing of the way vertices make up individual mesh elements. Unstructured meshes therefore allow for irregular mesh elements but require the explicit storage of adjacent vertex relationships, leading to lower storage efficiency and lower resolution. A structured mesh, however, has regular connectivity between its vertices (i.e., mesh elements and vertex distances are predefined), leading to higher space and storage efficiency, and superior resolution. The 3D models 130, 230, 330 generated by the 3D model generation modules 120, 220, 320, are scaled and structured meshes.
In the embodiment of
Keypoint annotation is the process of annotating the scaled unstructured mesh 223 by detecting keypoints within the mesh representation of the 3D object (e.g., on the object surface). The annotation of the unstructured 3D mesh is required as an initial stage in the generation of the structured 3D model. Annotation is the generation of annotation keypoints indicating salient features of the target object. Mesh annotations may be carried out through one or more 3D keypoint DLN modules that have been trained on a specific object type (e.g., a specific body part).
The keypoint detection process falls under the broad category of landmark detection. Landmark detection is a category of computer vision applications where DLNs are commonly used. Landmark detection denotes the identification of salient features in 2D or 3D imaging data and is widely used for purposes of localization, object recognition, etc. Various DLNs such as PointNet, FeedForward Neural Network (FFNN), Faster Regional Convolutional Neural Network (Faster R-CNN), and various other Convolutional Neural Network (CNNs) were designed for landmark detection. The 3D keypoint DLN 224 can be based on any 3D landmark detection machine learning algorithm, such as a PointNet.
PointNets are highly efficient DLNs that are applied in 3D semantic parsing, part segmentation, as well as classification. PointNets are designed to process point clouds directly, hence allowing effective 3D landmark detection. PoitnNets also avoid unnecessary transformations of the unstructured 3D mesh input. In one embodiment, the PointNet algorithm is implemented as described in Charles R. Qi, et al., “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” CVPR 2017, Nov. 9, 2017, available at arXiv: 1612.00593, which is hereby incorporated by reference in its entirety herein as if fully set forth herein. PointNets are only illustrative DLN algorithms that are within the scope of the present invention, and the present invention is not limited to the use of PointNets. Other DLN algorithms are also within the scope of the present invention. For example, in one embodiment of the present invention, a convolutional neural network (CNN) is utilized as a 3D keypoint DLN 224 to extract object keypoints and to annotate meshes.
To carry out 3D keypoint annotation, the 3D keypoint DLN 224 must be trained beforehand using training data sets comprising object meshes and corresponding keypoint annotations. Keypoint annotation DLNs can be trained to detect keypoints for a specific type of object. The 3D keypoint annotation DLN produces an annotated unstructured 3D mesh 227.
The retopology process 228 uses the annotated unstructured 3D mesh 227 alongside an annotated structured base 3D mesh 229 to generate a scaled structured 3D model 230. Retopology 228 is a morphing process that deforms the shape of an existing structured and annotated base 3D mesh 229 of the object into a structured 3D model 230 of the target object so that its keypoints match the keypoints detected on the object by the 3D keypoint DLN 224 (and represented by the annotated unstructured 3D mesh 227). Retopology may also operate on the mesh surface or projected two-dimensional contour, as discussed in the context of
The base 3D mesh 229 is a raw 3D mesh representation of the object that is stored on a server or within the device. The retopology process 228 can access a library of 3D base meshes containing at least one base 3D mesh 229 in the category of the target object. In one embodiment, the base 3D meshes in the library are structured and pre-annotated. The morphing of the base 3D mesh therefore produces a scaled and structured 3D mesh representation 230 of the object. Retopology is further discussed in more detail in the context of
The scaled and structured 3D model 230 generated by the 3D model generation module 220 is subsequently used to generate one or more measurements 240 of the target object.
In some embodiments, a depth sensor or a lidar sensor is used to determine the scale reference. In another embodiment, a ground plane on which the object is placed is initially scanned by the AR-guided application 304. Subsequent image capture screens assist the user in positioning the mobile computing device relative to the object in a plurality of imaging positions to capture a series of 2D images of the target object 306. In one embodiment, the ground plane is represented as three or more feature points in the captured 2D images 306. In one embodiment, the AR-guided application uses a software library to determine the real-world coordinates of two or more of the detected ground plane feature points, as discussed above. The captured 2D images 306 are thus used to determine one or more scale factors 310 (i.e., a scale reference) scaling the pixel dimensions of the target object to its real-world dimensions.
The 3D model generation module 320 uses the one or more determined scale factors 310 and the 2D images of the target object 306 to build a scaled and structured 3D model 330 of the target object in virtual 3D space. A scale factor 310 is required to scale from pixel dimensions in the 2D images to real-world dimensions. In particular, the scale factor 310 allows the scaling of the meshes within the 3D model generation module 320 to real-world dimensions.
In the embodiment of
In the embodiment of
Keypoint generation may be carried out through one or more 2D keypoint DLN modules that have been trained on a specific object type (e.g., human foot). In some embodiments, for example, the segmentation of the object from the background and its annotation may be carried out by two separate DLNs. The 2D keypoint generation process also falls under the category of landmark detection, as discussed above. Various landmark DLNs, such as the Stacked Hourglass Convolutional Neural Network (CNN), HRNet, FeedForward Neural Network (FFNN), Faster Regional Convolutional Neural Network (Faster R-CNN), and other CNNs, may be used to build a 2D keypoint DLN. An exemplary architecture of a Stacked Hourglass CNN is discussed in the context of
To carry out 2D keypoint annotation, the 2D keypoint DLN 324 must be trained beforehand using training data sets comprising object photos and corresponding keypoints. 2D keypoint DLNs can be trained to detect keypoints for a specific type of object. In some embodiments, segmentation (i.e., the separation of the object from its background) and annotation can be carried out through multiple DLN stages.
As is the case in the embodiment of
The scaled and structured 3D model 330 generated by the 3D model generation module 320 is subsequently used to generate one or more measurements 340 of the target object.
The DLN algorithms listed above for the various DLN applications disclosed herein (e.g., Stacked Hourglass) are only illustrative algorithms that are within the scope of the present invention, and the present invention is not limited to the use of the listed DLN algorithms. Other DLN algorithms are also within the scope of the present invention. Moreover, other machine learning (ML) methods may be used instead of or in combination with the various listed DLN algorithms. Other ML algorithms including, but not limited to, regressors, nearest neighbor algorithms, decision trees, support vector machines (SVM), Adaboost, Bayesian networks, fuzzy logic models, evolutionary algorithms, and so forth, are hence within the scope of the present invention.
The measurement application 450 is configured to capture a set of images 420 of an object 430 from different directions. In an embodiment of the invention, the set of images 420 obtained by the measurement application 450 are uploaded to a server 460 which analyzes the images using deep learning networks (DLNs) for extracting the measurements which help build 3D models of the object 430. The measurement application 450 includes a reference plane detector 452, an object identifier 454, an image recorder 456 and an image uploader 458. The reference plane detector 452 is based on AR-SDK. AR-SDK includes a set of processor-executable instructions that enable building and running AR applications on various devices wherein digital data is to be used with real-world images to enable various functions. Different types of AR-SDKs are available to build different types of applications such as but not limited to marker applications which function based on identification of certain markers such as bar codes, etc., location-based applications which do not function on markers but instead use GPS data or other position/location data, etc.
Prior to beginning the measurement process, the user may be requested to identify the object 430 to be measured so that the measurement application 450 can retrieve the corresponding measurement instructions 448. In an example, the measurement instructions 448 can be retrieved from the server 460 upon receiving the user input regarding the object 430 to be measured. The measurement instructions 448 provide audio and/or video instructions via a series of image capture screens with corresponding position guides that direct the user regarding the number of images to be captured for the object 430 and the relative positions of the mobile computing device 410 and the object 430 for each of the images.
One important function of AR-SDKs includes 3D image tracking which requires the AR-SDKs to recognize and track 3D objects which includes environment mapping. Many AR-SDKs such as but not limited to, Vuforia, Kudan, ARKit from Apple®, ARCore from Google®, ARToolKit—an open-source tool, etc. are currently in use which offer different functionalities. One of the available AR-SDKs can be employed for the ground plane detection. For example, ARCore works with Java/OpenGL, Unity and Unreal and focuses on functions such as motion tracking using a smartphone's camera to observe feature points in a room. ARCore can determine both the position and orientation of the phone as it moves so that virtual objects can be accurately placed. Similarly, ARCore can detect horizontal surfaces using the same feature points that it uses for motion tracking. As mentioned above, detecting planes is an important function for AR-SDKs as the AR experiences must be anchored to the detected planes. Different kinds of planes such as horizontal planes such as floor, table, ceiling, etc., vertical planes such as doors, walls, etc., or planes of arbitrary orientation such as ramps, etc. can be detected by some of the AR-SDKs.
Once the horizontal reference plane 470 is detected, a marker may be produced on the horizontal reference plane 470 indicating to the user operating the mobile computation device 410 that the measurement application 450 is ready for the next step in the measurement process. The next step includes the object identifier 454 identifying the object 430 to be measured. In an example, custom machine learning (ML) based object recognition models can be trained and employed for the object identifier 454. For example, Region-based convolutional neural networks (R-CNNs) or You Only Look Once (YOLO), etc., are designed for object recognition with the requisite speed in order to be used in real-time as required by the measurement application 450.
Different ML models can be trained to identify different objects that the measurement application 450 is configured to measure so that when the user inputs information regarding the object 430 to be measured, the measurement instructions 448 along with the corresponding ML model for that object are retrieved. Object recognition as implemented by the measurement application 450 can include at least two computer vision tasks—object localization pertaining to locating the object 430 in an image and optionally generating a bounding box around the object and object detection which includes locating the object within the bounding box. Upon localizing the object 430, the image recorder 456 activates the camera 404 to begin collecting the set of images 420. In an example, the image recorder 456 includes a position guide processor 4562 and a viewer 4564. The position guide processor 4562 receives a signal from the object identifier 454 regarding the location of the object 430 and generates a position guide (not shown) on the screen of the mobile computing device 410 for guiding the user regarding the optimum placement of the object 430 within the field of view for the image capture by the camera 404. The viewer 4564 includes a graphical user interface (GUI) for displaying the position guide and adjusting the position guide to signal the correct position for capturing the image of the object 430 in a given position either as a still photograph or a video. A UI element may be activated for capturing the image. When an image of satisfactory quality (e.g., no blurring or occlusions etc.) is captured or recorded, the user can be instructed to move to the next position to capture the next image in another direction and/or distance.
The set of images 420 thus captured can be uploaded to the server 460 for processing by the image uploader 458. The server 460 can include a scale information retriever 462 and a 3D model builder 464. The scale information retriever 462 measures the distance information between at least three feature points in the set of images 420 to generate the scale reference. The scale reference is used to scale the 3D model by the 3D model builder 464. Computer vision tools such as but not limited to Meshroom, Regard3D, etc., can be used to generate the 3D models.
If at 604, the user response indicates that the user desires to create a new model, the process moves to 608 wherein a user input identifying the object 430 or object type 102 to be measured is received. For example, the user may identify a left leg or a right arm or any other object that is to be measured. At 610, the measurement instructions 448 pertaining to the object 430 to be measured are retrieved and a first image capture screen is produced at 612. In an example, the first image capture screen can be used for sensing a horizontal reference surface and accordingly, the user can be instructed to move the mobile computing device 410 so that the camera 404 senses the horizontal reference surface (i.e., the ground plane) on which the object 430 to be measured is placed. The first image capture screen and the accompanying instructions provide directions to the user at 614 to move the mobile computing device 410 so that the horizontal reference surface 470 can be detected. Upon detecting the horizontal reference surface at 614, the user can be instructed via a subsequent object detection screen to focus the camera 404 on the object 430 in the next image capture screen that is generated at 616. The object 430 is detected at 618 and the imaging process to obtain the series of images 420 is commenced at 620. The embodiment described in
Importantly, scale reference 110 determination and 3D model 130 building may be implemented in the cloud (e.g., on a remote server), as described in
Stacked Hourglass CNNs are landmark detection DLNs that are efficient in detecting patterns such as human pose. They are usually composed of multiple stacked hourglass modules, where each hourglass module has symmetric downsampling and upsampling layers. Consecutive hourglass modules have intermediate supervision, thus allowing for repeated inference between the downsampling and upsampling layers. In one embodiment, the Stacked Hourglass CNN algorithm is implemented as described in Alejandro Newell, et al., “Stacked Hourglass Networks for Human Pose Estimation,” ECCV 2016, Sep. 17, 2016, available at arXiv: 1603.06937, which is hereby incorporated by reference in its entirety herein as if fully set forth herein.
The High-Resolution Network (HRNet) is another landmark detection DLN that is a suitable DLN base architecture for the keypoint DLN 112. HRNet are used in human pose estimation, semantic segmentation, and facial landmark detection. HRNets are composed of connected parallel high-to-low resolution convolutions, allowing repeated fusions across parallel convolutions, and leading to strong high-resolution representations. In one embodiment, the HRNet algorithm is implemented as described in Ke Sun, et al., “Deep High-Resolution Representation Learning for Human Pose Estimation,” CVPR 2019, Jan. 9, 2020, available at arXiv: 1902.09212, which is hereby incorporated by reference in its entirety herein as if fully set forth herein.
Stacked Hourglass CNNs and HRNets are only illustrative DLN algorithms that are within the scope of the present invention, and the present invention is not limited to the use of Stacked Hourglass CNNs or HRNets. Other DLN algorithms are also within the scope of the present invention. For example, in one embodiment of the present invention, a convolutional neural network (CNN) is utilized as a keypoint DLN 112 to extract object keypoints 114 from 2D input images or photos 106.
Retopology 328 is therefore an adaptive base mesh adjustment process, as shown in
Other embodiments of the retopology process may use the input 2D images 306 directly. An alternative embodiment is provided in PCT application No. PCT/US20/70465, which is hereby incorporated by reference in its entirety herein as if fully set forth herein. Different retopology methods can be used repeatedly and iteratively, where the error function is computed for several iterations of the morphed base mesh until a low enough error threshold is achieved.
According to one embodiment, the morphing of structured 3D base meshes through projection error minimization to generate structured 3D models improves on existing photogrammetry processes, and allows for the 3D reconstruction of an object's 3D model using as little as 4-6 photos, in some embodiments, instead of typical photogrammetry processes that might require 40-60 photos.
In addition to the prior images screen 1016,
Similarly, the image capture screens can indicate if the mobile computing device 410 is stable enough to capture the set of images 420 without faults such as blurring. Changes in speed or any movement of the mobile computing device 410 can result in a change of colors (or another notification) on the image capture screens. Angular velocity is calculated from the average of the maximum values in the latest 15 frames. In an example, if the velocity as sensed from the hardware on board the mobile computing device 410 such as the accelerometer or the gyroscope, falls below 0.5 units, then it is determined that the mobile computing device 410 is stable enough to execute the image capturing process. This is conveyed to the user using a change in colors or another notification. Additionally, or alternatively, the measurement system 450 may provide vibration feedback when the imaging position is achieved, for example, when the angle, speed and distance from the object 430 are within the thresholds for the imaging position. The term feedback may encompass any other form of notification such as visual and sound notifications. Additionally, or alternatively, a sound may be played when the image is captured. Moreover, in one embodiment, a green circle 1502 (1216 in
Photogrammetry is a process by which a 3D mesh is constructed from a set of 2D photographs or images. The resulting mesh is usually unstructured.
Structured and unstructured meshes differ by their connectivity. An unstructured mesh has irregular connectivity between vertices, requiring the explicit listing of the way vertices make up individual mesh elements. Unstructured meshes therefore allow for irregular mesh elements but require the explicit storage of adjacent vertex relationships, leading to lower storage efficiency and lower resolution. A structured mesh, however, has regular connectivity between its vertices (i.e., mesh elements and vertex distances are predefined), leading to higher space and storage efficiency, and superior resolution.
Various algorithms are within the scope of the present invention for constructing 3D meshes from the 2D photographs. One alternative embodiment in accordance with embodiments of the present invention is described in FIG. 4 of PCT application No. PCT/US20/70465. PCT application No. PCT/US20/70465 is hereby incorporated by reference in its entirety herein as if fully set forth herein. Implementation of other algorithms may involve different steps or processes in the construction of the 3D mesh.
Examples of the 2D photographs and 3D meshes constructed from the 2D photographs by a photogrammetry process are described, in accordance with embodiments of the present invention, in FIG. 5 of PCT application No. PCT/US20/70465, which is hereby incorporated by reference in its entirety herein as if fully set forth herein.
As discussed, the data (e.g., photos, textual descriptions, and the like) described throughout the disclosure can include data that is stored on a database stored or hosted on a cloud computing platform. It is to be understood that although this disclosure includes a detailed description on cloud computing, below, implementation of the teachings recited herein are not limited to a cloud computing environment. Rather, embodiments of the present invention are capable of being implemented in conjunction with any other type of computing environment now known or later developed.
Cloud computing can refer to a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model can include at least five characteristics, at least three service models, and at least four deployment models.
Characteristics may include one or more of the following. On-demand self-service: a cloud consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with the service's provider. Broad network access: capabilities are available over a network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs). Resource pooling: the provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to demand. There is a sense of location independence in that the consumer generally has no control or knowledge over the exact location of the provided resources but can be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Rapid elasticity: capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time. Measured service: cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.
In another embodiment, Service Models may include the one or more ofthe following. Software as a Service (SaaS): the capability provided to the consumer is to use the provider's applications running on a cloud infrastructure. The applications are accessible from various client devices through a thin client interface such as a web browser (e.g., web-based e-mail). The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.
Platform as a Service (PaaS): the capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including networks, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations.
Infrastructure as a Service (IaaS): the capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Deployment Models may include one or more of the following. Private cloud: the cloud infrastructure is operated solely for an organization. It can be managed by the organization or a third party and can exist on-premises or off-premises.
Community cloud: the cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It can be managed by the organizations or a third party and can exist on-premises or off-premises.
Public cloud: the cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.
Hybrid cloud: the cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
A cloud computing environment is service oriented with a focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure that includes a network of interconnected nodes.
The cloud computing environment may include one or more cloud computing nodes with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone, desktop computer, laptop computer, and/or automobile computer system can communicate. Nodes can communicate with one another. They can be group physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allows cloud computing environment to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types of computing devices are intended to be exemplary only and that computing nodes and cloud computing environment can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
The present invention may be implemented using server-based hardware and software.
The hardware of a user-device also typically receives a number of inputs 1610 and outputs 1620 for communicating information externally. For interface with a user, the hardware may include one or more user input devices (e.g., a keyboard, a mouse, a scanner, a microphone, a web camera, etc.) and a display (e.g., a Liquid Crystal Display (LCD) panel). For additional storage, the hardware may also include one or more mass storage devices 1690, e.g., a floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g. a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive, etc.) and/or a tape drive, among others. Furthermore, the hardware may include an interface one or more external SQL databases 1630, as well as one or more networks 1680 (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the hardware typically includes suitable analog and/or digital interfaces to communicate with each other.
The hardware operates under the control of an operating system 1670, and executes various computer software applications 1660, components, programs, codes, libraries, objects, modules, etc. indicated collectively by reference numerals to perform the methods, processes, and techniques described above.
The present invention may be implemented in a client server environment.
In some embodiments of the present invention, the entire system can be implemented and offered to the end-users and operators over the Internet, in a so-called cloud implementation. No local installation of software or hardware would be needed, and the end-users and operators would be allowed access to the systems of the present invention directly over the Internet, using either a web browser or similar software on a client, which client could be a desktop, laptop, mobile device, and so on. This eliminates any need for custom software installation on the client side and increases the flexibility of delivery of the service (software-as-a-service) and increases user satisfaction and ease of use. Various business models, revenue models, and delivery mechanisms for the present invention are envisioned, and are all to be considered within the scope of the present invention.
In general, the method executed to implement the embodiments of the invention, may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer program(s)” or “computer code(s).” The computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects of the invention. Moreover, while the invention has been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments of the invention are capable of being distributed as a program product in a variety of forms, and that the invention applies equally regardless of the particular type of machine or computer-readable media used to actually effect the distribution. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks, (DVDs), etc.), and digital and analog communication media.
One of ordinary skill in the art knows that the use cases, structures, schematics, and flow diagrams may be performed in other orders or combinations, but the inventive concept of the present invention remains without departing from the broader scope of the invention. Every embodiment may be unique, and methods/steps may be either shortened or lengthened, overlapped with the other activities, postponed, delayed, and continued after a time gap, such that every user is accommodated to practice the methods of the present invention.
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that the various modification and changes can be made to these embodiments without departing from the broader scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than in a restrictive sense. It will also be apparent to the skilled artisan that the embodiments described above are specific examples of a single broader invention which may have greater scope than any of the singular descriptions taught. There may be many alterations made in the descriptions without departing from the scope of the present invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/014549 | 1/22/2021 | WO |
Number | Date | Country | |
---|---|---|---|
62964562 | Jan 2020 | US |