The subject matter herein generally relates to image processing.
The application of automobile and navigation in people's life is becoming more and more popular with the progress of science and technology. The obstacle detection of measurement by vision technology also has important applications in the field of vehicle assisted driving, robot navigation and so on.
At present, the common visual measurement technologies comprise monocular vision measurement, binocular vision measurement, and structured light vision measurement. The binocular vision measurement structure is more complex, and the time of measurement is longer. The measurement system calibration of the structured light vision measurement is more difficult and is expensive in intelligent applications such as vehicle-assisted driving and robot navigation.
Implementations of the present disclosure will now be described, by way of embodiments, with reference to the attached figures.
It will be appreciated that for simplicity and clarity of illustration, where appropriate, reference numerals have been repeated among the different figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein can be practiced without these specific details. In other instances, methods, procedures, and components have not been described in detail so as not to obscure the related relevant feature being described. Also, the description is not to be considered as limiting the scope of the embodiments described herein. The drawings are not necessarily to scale and the proportions of certain parts may be exaggerated to better illustrate details and features of the present disclosure. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean “at least one”.
Several definitions that apply throughout this disclosure will now be presented.
The connection can be such that the objects are permanently connected or releasably connected. The term “comprising,” when utilized, means “including, but not necessarily limited to”; it specifically indicates open-ended inclusion or membership in the so-described combination, group, series, and the like.
In block S100, an image containing an object is obtained through a monocular camera.
In one embodiment, a monocular camera installed on a sweeping robot is taken as an example. An image containing an object is obtained by a monocular camera, such as a RGB camera or an IR camera. Installation height and installation angle of a monocular camera of the sweeping robot are fixed, in other words, spatial characteristics of the monocular camera installed on the sweeping robot are fixed, the spatial characteristics of the monocular camera may comprise a height and an angle of the monocular camera installed on the sweeping robot relative to the ground.
Because the monocular camera has the advantages of convenient installation, small volume, and low cost, it has a broader application prospect in the field of obstacle detection. Therefore, in one embodiment, information as to the environment around the sweeping robot is obtained by a monocular camera. A binocular camera can be used in place of the monocular camera.
In block S200, a pixel coordinate of the object in the image is determined.
Further combined with
The advantage of the YOLOv3 algorithm is being able to usex Darknet53 network as the backbone. The Darknet53 network uses extensive residual skip connections like the ResNet network to deepen the network. Feature extraction can extract higher-level features and reduce the negative effect of gradient caused by pooling. The robustness and generalization ability of the algorithm are very good and can effectively obtain the object coordinates, species, and other information.
In block S300, a spatial position of the object in the image is determined, based on the pixel coordinate of the object in the image and a preset transformation relationship or a preset depth prediction model.
In one embodiment, establishing the preset coordinate transformation relationship comprises following blocks. The pixel coordinate of the object in the image with the monocular camera as center is converted into an actual coordinate of a global coordinate system through an internal parameter of the monocular camera and a pinhole imaging principle to establish the preset coordinate transformation relationship.
Specifically, the monocular camera is calibrated by a camera calibration algorithm to obtain the internal parameters of the monocular camera. The pixel coordinates of the object of the image are converted into the actual coordinates of the global coordinates system through the internal parameters of the monocular camera to establish a preset coordinate conversion relationship. Internal parameters of the monocular camera can comprise a focal length and a pixel coordinate of the projection point which are formed by the optical center of the image captured by the monocular camera.
In one embodiment, the spatial position of the object in the image is determined based on the pixel coordinate of the object in the image and the preset coordinate transformation relationship, this step is explained in the following blocks. The pixel coordinate of the object in the image is converted into the actual coordinate of the world coordinate system through the preset coordinate conversion relationship. A spatial position of the object in the image is obtained according to the actual coordinate of the world coordinate system.
Further combined with
In one embodiment, after the two coordinate points (X1, Y1) and (X2, Y2) shown in
In addition to the application of the above pinhole imaging principle, the preset depth prediction model for the monocular camera can also be generated by statistical learning methods, such as machine learning.
In one embodiment, a method of establishing the preset depth prediction model comprises following blocks. A depth marking point at different positions of the ground is set, a three-dimensional coordinate of the depth marking point is obtained to generate a training data set according to depth information of the depth marking points in a marking process and a pixel coordinate corresponding to the depth marking points in the image captured by the monocular camera, and the preset depth prediction model is generated through feature engineering of the training data set and a preset modeling algorithm.
Specifically, further combined with
In one embodiment, a preset depth prediction model for depth D is generated by using deep learning to perform end-to-end modeling with the two coordinates x and y.
In one embodiment, determining the spatial position of the target object in the target image based on the pixel coordinate of the target object in the target image and the preset depth prediction model can be done in several ways; in addition to a checkerboard, there is the method of marking default depth markers on the ground, such as when using a depth camera, laser projection, and the like.
In one embodiment, the spatial position of the object in the image is determined based on the pixel coordinate of the object in the image and the preset depth prediction model. A pixel coordinate of the object is input into the preset depth prediction model to obtain a depth value of the object, and a spatial position of the object in the image is determined.
Specifically, after the preset depth prediction model is completed, the coordinates x and y of the ground in the image obtained by the monocular camera is input into the preset depth prediction model to obtain the corresponding depth value.
The monocular camera can be fixed at different heights or field of view angle of the sweeping robot to change the vision of the ground vision covered by the monocular camera. Through the above method, a new preset coordinate transformation relationship or a preset depth prediction model is established, and then the computable or predicted space range can be changed to apply in different places and needs, so as to improve applicability of the sweeping robot.
In the above embodiment, the object detection method acquires the image containing the object through the monocular camera installed on the sweeping robot and determines the pixel coordinates of the object in the image. Then, according to the pixel coordinate of the object in the image and a preset transformation relationship or a preset depth prediction model, a spatial position of the object in the image is determined. The object detection method can effectively solve the problems of high cost, complex structure, poor real-time performance, and low accuracy of visual ranging, detecting the object efficiently and accurately, so that the sweeping robot can avoid obstacles that should be avoided, such as feces, biological objects, socks, etc. At the same time, the sweeping robot will not contact obstacles which may spread dirt further on the ground, or cause biological injury or shock, as well as problems in vacuuming. Therefore, there are significant economic benefits for expanding the intelligent obstacle avoidance ability or other positioning requirement of the sweeping robot.
Further combined with
Specifically, in one embodiment, the object detection device 10 comprises an acquisition module 11, a conversion module 12, and a positioning module 13. The acquisition module 11 obtains an image containing an object through a monocular camera, the conversion module 12 determines a pixel coordinate of the object in the image, and the positioning module 13 determines a spatial position of the object in the image, based on the pixel coordinate of the object in the image and a preset transformation relationship or a preset depth prediction model.
As shown in
In one embodiment, a non-transitory storage medium recording instructions is disclosed. When the recorded computer instructions are executed by a processor of an electronic device 20, the electronic device 20 can perform the method.
The embodiments shown and described above are only examples. Many details known in the field are neither shown nor described. Even though numerous characteristics and advantages of the present technology have been set forth in the foregoing description, together with details of the structure and function of the present disclosure, the disclosure is illustrative only, and changes may be made in the detail, including in matters of shape, size, and arrangement of the parts within the principles of the present disclosure, up to and including the full extent established by the broad general meaning of the terms used in the claims. It will therefore be appreciated that the embodiments described above may be modified within the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202210556465X | May 2022 | CN | national |