The present disclosure relates generally to object detection technologies to find object location in images, and more specifically, through conducting object detection from integrating 2D image recognition and 3D scene reconstruction.
Object detection for two dimensional (2D) still images have been implemented in the related art, including pixel-based template matching methods, specific pattern detection methods such as a method using Hough transformation, and methods using machine learning. For object detection using machine learning, face detection technology based on Haar-like features and cascaded classifiers has been implemented in the related art. In addition, methods using deep neural networks which facilitate simultaneous learning of image features and object localization have led to high-accuracy multi-class object detection.
Recently, with the development of depth sensors and related computer vision technologies, the use of three dimensional (3D) data became easier, and object detection methods for 3D have been proposed in the related art, such as deep neural networks taking 3D point clouds as input. Other related art implementations of handling voxel data and mesh data have also been utilized.
In the related art, there are techniques for 3D analysis of a scene including detection, segmentation and registration of objects within the scene. The analysis results may be used to implement augmented reality operations including removal and insertion of objects and the generation of blueprints. An example related art method may include receiving 3D image frames of the scene, each frame associated with a pose of a depth camera, and creating a 3D reconstruction of the scene based on depth pixels that are projected and accumulated into a global coordinate system. The related art method may also include detecting objects, and associated locations within the scene, based on the 3D reconstruction, the camera pose and the image frames. The related art method may further include segmenting the detected objects into points of the 3D reconstruction corresponding to contours of the object and registering the segmented objects to 3D models of the objects to determine their alignment.
Example implementations described herein address the above problems for object detection technologies for a real-world scene. The example implementations described herein are directed to the counting of industrial parts such as stacked pipes, but can be extended to other situations in accordance with the desired implementation.
Aspects of the present disclosure can include a method, involving conducting raycasting on a plurality of images to generate a point cloud; executing two dimensional (2D) object detection on the plurality of images; for the 2D object detection recognizing an object, determining a location of the object in three dimensional (3D) space from the point cloud; for the location not overlapping another marker, classifying the object from the 2D object detection; and placing a marker in the 3D space to represent the object based on the classifying.
Aspects of the present disclosure can include a non-transitory computer readable medium, having instructions involving conducting raycasting on a plurality of images to generate a point cloud; executing two dimensional (2D) object detection on the plurality of images; for the 2D object detection recognizing an object, determining a location of the object in three dimensional (3D) space from the point cloud; for the location not overlapping another marker, classifying the object from the 2D object detection; and placing a marker in the 3D space to represent the object based on the classifying.
Aspects of the present disclosure can include a system, involving means for conducting raycasting on a plurality of images to generate a point cloud; means for executing two dimensional (2D) object detection on the plurality of images; for the 2D object detection recognizing an object, means for determining a location of the object in three dimensional (3D) space from the point cloud; for the location not overlapping another marker, means for classifying the object from the 2D object detection; and means for placing a marker in the 3D space to represent the object based on the classifying.
Aspects of the present disclosure can include an apparatus, involving a processor, configured to conduct raycasting on a plurality of images to generate a point cloud; execute two dimensional (2D) object detection on the plurality of images; for the 2D object detection recognizing an object, determine a location of the object in three dimensional (3D) space from the point cloud; for the location not overlapping another marker, classify the object from the 2D object detection; and place a marker in the 3D space to represent the object based on the classification.
The following detailed description provides further details of the figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of ordinary skill in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Example implementations as described herein can be utilized either singularly or in combination and the functionality of the example implementations can be implemented through any means according to the desired implementations.
In the related art, there are problems that occur in single-view image recognition as illustrated in
In example implementations described herein, object detection is performed on a 2D still image, wherein the detection results are projected onto a 3D reconstructed space, and 3D positions of objects are specified. Markers corresponding to each object are placed at corresponding positions in the 3D space. The above processing is performed on images from multiple viewpoints. For a newly detected object, collision detection with existing markers is performed, and if a marker already exists, a new marker is not placed.
In example implementations described herein, object detection is performed on images from multiple viewpoints, whereby object detection becomes robust to occlusion and frame-out. Since the 3D data is not directly used for object detection but is used for estimating and managing the positions of detected objects in the 3D space, the data can be applied to a sparse point cloud with less information (without color or texture). Double counting can be prevented when integrating the detection results of multiple viewpoints by managing the object position in the 3-dimensional space and performing collision detection.
In example implementations, there is a 2D image acquisition unit and a 3D data acquisition unit. The 2D image acquisition unit can include any general camera capable of capturing still images in accordance with the desired implementation. The 3D data acquisition unit may use a depth sensor, or may calculate 3D information from still images by computer vision technologies, or through any other implementation in accordance with the desired implementation. In addition, to improve the accuracy of scene reconstruction and self-positioning, information such as an acceleration sensor (e.g., accelerometer) may be used. The 2D image acquisition unit and 3D data acquisition unit are calibrated so that correspondence between position data can be obtained.
In this system, object detection is first performed on a 2D image. As a result, coordinates of objects in the image can be obtained.
Next, the system calculates the 3D coordinates of the detected object. In this system, by projecting from 2D coordinates of a detected object to 3D data (e.g. point cloud, voxel, mesh), the position of the 3D space is estimated as shown in
Then, the system places one or more markers on the obtained 3D coordinates as shown in
After processing from one viewpoint, the user moves the data acquisition unit and obtains an image from another viewpoint.
Depending on the desired implementation, image decode unit 701, image recognition unit 703, object management unit 705, 3D reconstruction unit 702, 3D data storage 704 and display control unit 706 can reside on an external server configured to conduct background processing on the images received from data capturing device 700. 3D reconstruction unit 702 can be configured to facilitate the functions as illustrated in
Example functions that can be implemented from the system to facilitate the example implementations described herein can involve the following. To facilitate example implementations, there may need to be a function to correct for the starting point of the projection. For example, object detection may take some time to conduct, wherein the time lag between capturing the image from the data capturing device 700 and projecting detection results from the display control unit 706 may cause a misalignment of 3D position. Thus in example implementations described herein, the device position at which the 2D image is captured is also stored by the system. The stored position can then be used to project detection results in accordance with the desired implementation, and as shown at
In another example function, object management unit 705 can be configured to re-project old detection results after 3D point clouds become dense enough for projection, if projection of detection results fail due to the 3D point clouds being sparse. In such example implementations, the projection of results can be delayed until sufficient density of 3D point clouds is obtained.
In another example function, object management unit 705 can be configured to search the vacant space in 3D space to determine appropriate camera angles from which such spaces can be seen. In such an example implementation, the display control unit 706 can thereby provide a recommendation to the user of the data capturing device 700 to move the device to an appropriate position and angle to capture more areas so that object recognition can become more accurate. In such manner, the appropriate viewpoints as shown in
Functions can also be provided to extend example implementations to moving objects. In such an example implementation, upon placing a marker as shown in
Functions can also be provided to facilitate capture from multiple devices, an example of which is provided in
Functions can also be provided to improve the projection of point clouds if the point clouds are insufficiently dense. In such an example implementation, structures can also be generated (e.g., mesh, plane) as illustrated in
Example implementations can also adjust the size of the marker depending on the size of the object. Such example implementations can address potential failure of detecting collisions due to the marker size being too big or too small. Thus, depending on the type of objects to be detected, the marker size can be adjusted according to the desired implementation and/or the range of collision can be similarly adjusted to compensate for the sizes of different types of objects. Example implementations can also improve collision detection during the 2D object detection through considering the type of object as described to address complicated scenarios in which there are multiple different types of objects, or densely aligned objects. In such example implementations, a reliability score can also be assigned to detected 2D objects to provide an assessment of the confidence of the detection. Such a reliability score can be conducted through any desired implementation.
Example implementations can also provide an interactive interface, as there may be errors in object detection through 2D recognition. In such an example implementation, an interface is provided as shown at
In an example implementation, processor 800 can be configured to conduct raycasting on a plurality of images to generate a point cloud as illustrated in
In example implementations, the plurality of images as illustrated in
In example implementations, processor 800 can be further configured to, for the point cloud not meeting a sufficient density, project additional points from a database of previously raycast point clouds based on the one or more of the position and acceleration of the device. In such example implementations, 3D data storage 704 may manage point clouds as raycast previously from the device or from other devices. Such points can be provided to fill in the point clouds to the data capturing device 700 to speed up implementations of 2D to 3D processing. The density can be in accordance with a threshold as set to the desired implementation.
In example implementations, processor 800 can be configured to search the 3D space for one or more vacant areas; and generate a recommendation for the device involving a position and angle to conduct image capture based on the one or more vacant areas. In an example implementation, the searching of the 3D space can be conducted based on the point cloud density of a particular area as stored in 3D data storage 704 and as illustrated in
Processor 800 can also be further configured to provide an interface to the device configured to add or remove one or more objects detected in the 2D object detection from the plurality of images as illustrated in
Processor 800 can also be configured to classify the object from 2D detection by determining a type of the object and determining a size of the marker from the type of the object as described with respect to
The interface panel shown on
Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
Example implementations may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer-readable storage medium or a computer-readable signal medium. A computer-readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.
Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the example implementations as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and example implementations be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.