When a human looks at an object in a picture or a video sequence, s/he recognizes two pieces of information about the object. The first piece of information is the identity of the object and the second piece of information is the spatial aspects of the object. For example, when someone sees a picture of a car, s/he does not only recognize the car in the picture, but envisions the three-dimensional (3D) shape of the car s/he is seeing, irrespective of the parts of the car that the images may not show. In other words, recognizing and visualizing objects in pictures are two simultaneous processes that human brains perform with little effort. This is despite the fact that humans recognize and visualize objects that are seen from different points-of-view or even if the objects have a set of different details or appearance.
In computer vision, some algorithms and techniques enable recognizing some objects such as vehicles, buildings, animals, or humans, but no algorithm or technique—so far—enables recognizing and visualizing objects simultaneously. In fact, there is a need for a universal solution that achieves simultaneous recognition and visualization for objects similar to what the human brain does. This universal solution will open the door for numerous educational, gaming, medical, engineering and industrial applications.
The present invention introduces a method for recognizing and visualizing objects in images and video sequences. Accordingly, it becomes possible to partially view an object using a device camera and see the object's name presented on the device display with a 3D model for the object. The user can then rotate or walk through the 3D model on the device display to view the hidden parts of the object that are not seen from the user's point-of-view. Generally, the method of the present invention is used with pictures of printed materials such as books, newspapers and magazines. It can also be used with images presented on a display of a computer, tablet, mobile phone or the like. Furthermore, the method is used for identifying and visualizing buildings, vehicles, or other objects located indoors or outdoors while using one of the modern head mounted computer displays or glasses known commercially as wearable devices.
In one embodiment, the present invention discloses a method for identifying and visualizing a 3D model of an object presented in a picture. This method is comprised of four steps: first, creating a 3D model for the object according to a vector graphic format then rotating the 3D model horizontally and vertically in front of a virtual camera to store each image of the 3D model during each rotation; second, analyzing each image's parameters—including the number of two-dimensional (2D) shapes contained in the image, the identities of the 2D shapes and the attachment relationship between the 2D shapes—to create a list of unique images with unique parameters; third, detecting the edges of the object in the picture and analyzing the edge parameters including the number of 2D shapes contained in the edges, the identity of the 2D shapes and the attachment relationship between the 2D shapes; fourth, checking the edge parameters against the list of unique images to determine if the edge parameters match any of the images in the list and then displaying the object's name and its corresponding 3D model.
In another embodiment, the present invention discloses a method for determining a point-of-view of a camera relative to an object as it appears in a picture taken by said camera to present a 3D model for the object according to the point-of-view. This method is comprised of four steps: first, creating a 3D model for the object according to a vector graphic format then rotating the 3D model horizontally and vertically in front of a virtual camera to store each image of the 3D model associated with each unique camera position; second, analyzing each image parameter—including the number of 2D shapes contained in the image, the identities of the 2D shapes and the attachment relationship between the 2D shapes—and creating a list of unique images with their correspondingly unique parameters; third, detecting the edges of the object in the picture and analyzing the edge parameters including the number of 2D shapes of the edges, the identities of the 2D shapes and the attachment relationship between the 2D shapes; fourth, checking the edge parameters against the list of the unique images to find the image(s) with parameters that match the edge parameters and then displaying the 3D model according to the camera position associated with the image.
Generally, an object in an image is identified as a cube if the object's edges in the image form one or more 2D shapes attached to each other according to one of the alternatives of the previous database. However, if the object's image or picture is taken by a digital camera, then an edge detection program is utilized, as known in the art, to detect the edges of the 2D shapes that the object is comprised in the picture. After that, each 2D shape in the object's image is analyzed to determine its identity. Also, the attachment relationship between the 2D shapes is defined or described. At this moment, the number of the 2D shapes, the identities of the 2D shapes and the attachment relationship between the 2D shapes are checked against a database that assigns an ID or name for each unique combination of a number of 2D shapes, identities of 2D shapes and attachment relationship between 2D shapes. However, as was described previously, the database can be automatically created by rotating a 3D model for the object in front of a virtual camera to capture the object's images from different points-of-view and create a list of all unique combinations of 2D shapes, identities of 2D shapes and attachment relationships between the 2D shapes that appear in an image. Once the object is defined using the database then the object's name and the 3D model of the object, which is stored with the database content, are presented to the user.
In one embodiment, the present invention discloses a method for identifying and visualizing a 3D model of an object presented in a picture. This method is comprised of four steps: first, creating a 3D model for the object according to a vector graphic format then rotating the 3D model horizontally and vertically in front of a virtual camera to store each image of the 3D model during each rotation; second, analyzing each image's parameters—including the number of 2D shapes contained in the image, the identities of the 2D shapes and the attachment relationship between the 2D shapes—to create a list of unique images with unique parameters; third, detecting the edges of the object in the picture and analyzing the edge parameters including the number of 2D shapes contained in the edges, the identity of the 2D shapes and the attachment relationship between the 2D shapes; fourth, checking the edge parameters against the list of unique images to determine if the edge parameters match any of the images in the list and then displaying the object's name and its corresponding 3D model.
It is important to note that during the rotation of the virtual 3D model of the object in front of the virtual camera, the position of the camera relative to the virtual 3D model can be determined and stored. Accordingly, each unique combination of a number of 2D shapes, identities of the 2D shapes and attachment relationship between the 2D shapes is assigned with a corresponding position for the virtual camera. This way, when taking an object's picture by a digital camera and analyzing the object's edges or 2D shapes, the result of this analysis indicates the position of the digital camera relative to the object at the moment of taking the picture. Accordingly, the 3D model of the object is presented to the user on the camera display to match his/her position relative to the object. In this case, the user may interact with the 3D model on the camera display to rotate it vertically or horizontally, or to walk though the 3D model to see more interior details.
In another embodiment, the present invention discloses a method for determining a point-of-view of a camera relative to an object as it appears in a picture taken by said camera to present a 3D model for the object according to the point-of-view. This method is comprised of four steps: first, creating a 3D model for the object according to a vector graphic format then rotating the 3D model horizontally and vertically in front of a virtual camera to store each image of the 3D model associated with each unique camera position; second, analyzing each image parameter—including the number of 2D shapes contained in the image, the identities of the 2D shapes and the attachment relationship between the 2D shapes—and creating a list of unique images with their correspondingly unique parameters; third, detecting the edges of the object in the picture and analyzing the edge parameters including the number of 2D shapes of the edges, the identities of the 2D shapes and the attachment relationship between the 2D shapes; fourth, checking the edge parameters against the list of the unique images to find the image(s) with parameters that match the edge parameters and then displaying the 3D model according to the camera position associated with the image.
It is important to note that the slight difference of the virtual camera position relative to the virtual 3D object may not lead to a different combination of a number of 2D shapes, identities of 2D shapes and attachment relationships between the 2D shapes. However, the relative dimensions of the 2D shapes will vary from slight changes in position of a virtual camera, whereas storing the dimensions of the 2D shapes of each image leads to determining the exact position of the virtual camera. Accordingly, in this case, the list of unique images will include all images that have similar parameters but with different 2D shape dimensions.
Generally, the 2D shapes that result from analyzing the object's edges in the images or picture can be classified into individual 2D shapes and combined 2D shapes. The individual 2D shapes are the 2D shapes that have a simple form such as a circle, rectangle, triangle or parallelogram. The combined 2D shapes are the 2D shapes that are comprised of a plurality of individual 2D shapes attached to each other in a certain manner to form one entity. For example, the L-shape is a combined 2D shape comprised of two individual 2D shapes in the form of two rectangles attached to each other. Also the U-shape is a combined 2D shape comprised of three individual 2D shapes in the form of three rectangles attached to each other.
To identify an individual 2D shape, five steps are processed. The first step is slicing the individual 2D shape with a plurality of rays creating a number of intersectional lines. The second step is determining the axis pattern that describes a path connecting between the middles of the successive intersectional lines. The third step is determining the shapes pattern that describes the intersectional lines. The fourth step is determining the length pattern that describes the length variations between the intersectional lines. The fifth step is checking the axis pattern, the shape pattern and the length pattern against a database that associates each unique combination of an axis pattern, shape pattern and length pattern with a unique ID identifying a 2D object.
For example,
To identify a combined 2D shape, the combined 2D shape is divided into a plurality of individual 2D shapes where each individual 2D shape is identified alongside the attachment relationship between the individual 2D shapes. Comparing the identities of the individual 2D shapes and their attachment relationship against a database that associates a unique ID for each unique combination of 2D shapes, identities and attachment relationships enables identifying the combined 2D shapes. For example,
Finally, the 3D models described in the previous examples are represented according to a vector graphics format. However, in cases where 3D models are represented by a set of points using the point cloud technique, in this case, the set of points are converted into a plurality of triangles represented according to the vector graphics format, as known in the art, where then the method of the present invention can be utilized with the triangles. Also, if the 3D model is represented according to a raster graphics format, then an edge detection program is utilized, as known in the art, to detect the edges of the 3D model and convert them into lines where each two lines that meet at one point are converted into a triangle. Accordingly, the 3D model can be represented by a plurality of triangles according to a vector graphics format where then the method of the present invention can be utilized with these triangles.
This application is a Continuation-in-Part of co-pending U.S. patent application Ser. No. 12/462,715, filed Aug. 7, 2009, titled “Converting a drawing into multiple matrices”, and Ser. No. 16/271,892, filed Jul. 10, 2013, titled “Object recognition for 3D models and 2D drawings”.