The following relates to the technical field of intelligent transportation and image processing, and more particularly to a monocular vision ranging method, a storage medium, and a monocular camera.
At present, vehicles having autonomous driving (AD) functions or advanced driver assistance systems (ADAS) have been introduced to the market, which has greatly promoted the development of intelligent transportation.
In the existing technology, the sensors supporting AD/ADAS mainly include radar, visual camera system, lidar, ultrasonic sensor, and the like, of which, the visual camera system is the most widely used due to its capability of obtaining the same two-dimensional image information as human vision, and its typical application includes lane detection, object detection, vehicle detection, pedestrian detection, cyclist detection, and other designated target detection.
The existing visual camera systems configured for object recognition/detection mainly include monocular cameras and stereo cameras, both of which have their own characteristics. The monocular cameras are compact, simple, and easy to install, and require less computation than the stereo cameras. Due to these advantages, the monocular cameras are increasingly used in the actual market. However, the monocular cameras have a fatal disadvantage, that is, the distance estimation accuracy is not sufficient (lower than that of the stereo cameras), so it has long been expected to improve the distance estimation accuracy of the monocular cameras.
An aspect relates to providing a monocular vision ranging method, so as to tackle the poor distance estimation accuracy of the existing monocular cameras.
To achieve the above aspect, the present application adopts the following technical solutions:
A monocular vision ranging method, comprises: calculating a first distance between a target and a monocular camera, based on a geometric relationship between the monocular camera and the target; calculating a second distance between the target and the monocular camera, based on a size ratio of the target to a reference target in a corresponding reference image, in which, the size ratio comprises a height ratio or a width ratio; evaluating credibilities of the first distance and the second distance, and determining weight values assigned to the first distance and the second distance, respectively, in which, the higher the credibilities are, the higher the weight values are; and calculating a final distance between the target and the monocular camera, based on the first distance, the second distance, and the weight values respectively corresponding to the first distance and the second distance.
The monocular vision ranging method is explained hereinbelow from one aspect, the calculating a first distance between a target and a monocular camera comprises: setting an imaginary window at a distance in a range of interest of the monocular camera, in which, the imaginary window has a predetermined physical size and comprises all or part of the target and a bottom of the imaginary window touches a real ground surface; and defining a distance from the monocular camera to the imaginary window as the first distance d1.
Furthermore, the calculating a first distance between a target and a monocular camera comprises: setting a plurality of imaginary windows at different distances from the monocular camera in a range of interest of the monocular camera, in which, each of the plurality of imaginary windows has a different distance from the monocular camera and has the same physical size and comprises all or part of the target; evaluating a height ratio between a target height and a distance from a target bottom to each window bottom; and scoring each imaginary window according to an evaluation result, selecting an imaginary window having a highest score, and taking a distance between the imaginary window having a highest score and the monocular camera as the first distance d1.
Furthermore, the evaluating a height ratio between a target height and a distance from a target bottom to each window bottom comprises: calculating the height ratio between the target height and the distance from the target bottom to each window bottom according to the physical size of each imaginary window, and evaluating the height ratio between the target height and the distance from the target bottom to each window bottom according to a calculation result.
Furthermore, the evaluating the credibility of the first distance comprises: acquiring a height hl of the target and a distance h2 from a target bottom to a window bottom, which is configured to determine the first distance; and calculating a ratio between h1 and h2, expressed by h2/h1, in which, the closer the ratio h2/h1 is to 0, the higher the corresponding reliability is.
Furthermore, the calculating a second distance between the target and the monocular camera comprise the following steps: acquiring a size parameter s1 of the target, in which, the parameter s1 of the target comprises a height h1 of the target or a width w1 of the target; acquiring a size parameter s_ref of the reference target in the reference image corresponding to the size parameter s1 of the target, in which, the reference image has a reference distance d_ref from the monocular camera, and the size parameter of the reference target comprises a height h_ref of the reference target or a width w_ref of the reference target; and acquiring a second distance d2 according to the following formula:
Furthermore, the evaluating the credibility of the second distance comprises: determining a ratio between s1 and s_ref, expressed by s_ref/s1, in which, the closer the ratio s_ref/s1 is to 1, the higher the reliability of the corresponding second distance is.
Furthermore, the calculating a final distance between the target and the monocular camera, based on the first distance, the second distance, and the weight values respectively corresponding to the first distance and the second distance comprises:
calculating the final distance by adopting the following formula:
in which, range_w_r represents the final distance, d1 represents the first distance, pint_win represents a weight value corresponding to the first distance, d2 represents the second distance, point_ref represents a weight value corresponding to the second distance.
Furthermore, when adopting multiple imaginary windows, the final distance range_w_r that has a highest score based on the first weight value point_win and the second weight value point_ref are selected as the final distance, in which, the score is a sum of the first weight value point_win and the second weight value point_ref, or a function of the first weight value point_win and the second weight value point_ref.
Compared with the existing technology, the monocular vision ranging method according to embodiments of the present application combines two ranging schemes of the first distance and the second distance (that is, based on the geometric relationship and based on image matching) to jointly determine the final distance, which significantly improves the ranging reliability of the monocular camera, and relative to the single-distance approach, if one ranging scheme is not available, another ranging scheme can be used in some cases. Therefore, the monocular vision ranging method according to embodiments of the present application can achieve better, more stable, and wider target detection.
It is another aspect of the present application to provide a machine-readable storage medium and a processor, so as to tackle the poor distance estimation accuracy of the existing monocular cameras.
To achieve the above aspect, the present application adopts the following technical solutions:
A machine-readable storage medium, being stored with instructions configured to cause a machine to execute the above monocular vision ranging method.
A processor, configured for running a program, when the program is runed, the above monocular vision ranging method is executed.
The machine-readable storage medium and the processor have the same advantages over the existing technology as the above monocular vision ranging method, which will not be repeated here.
It is another aspect of the present application to provide a monocular camera, so as to tackle the poor distance estimation accuracy of the existing monocular cameras.
To achieve the above aspect, the present application adopts the following technical solutions:
A monocular camera comprises: one or more processors; and a memory for storing one or more programs. When the one or more programs is executed by the one or more processors, the one or more processor is caused to implement the above monocular vision ranging method.
The monocular camera has the same advantages over the existing technology as the above monocular vision ranging method, which will not be repeated here.
Other features and advantages of the present application will be described in detail in the detailed description hereinbelow.
Some of the embodiments will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein
It should be noted that the embodiments of the present application and the features of the embodiments may be combined with each other in conditions of no conflict.
The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In Step S100, a first distance between a target and a monocular camera is calculated based on a geometric relationship between the monocular camera and the target;
The target refers to a target that is expected to be shot by the monocular camera, for example, other vehicles, obstacles like road cones, and the like in front of the vehicle.
For example,
In step S200, a second distance between the target and the monocular camera is calculated based on a size ratio of the target to a reference target in a corresponding reference image.
The reference target refers to a target in an image shot by the same camera with a reference distance before the measurement, or a target calculated according to the reference distance. In an optional embodiment, the calculating a second distance between the target and the monocular camera may comprise the following steps: acquiring a size parameter s1 of the target, in which the size parameter s1 of the target comprises a height h1 of the target or a width w1 of the target; acquiring a size parameter s_ref of the reference target in the reference image corresponding to the size parameter s1 of the target, in which, the reference image has a reference distance d_ref from the monocular camera, and the size parameter of the reference target comprises a height h_ref of the reference target or a width w_ref of the reference target; and acquiring a second distance d2 according to the following formula:
Furthermore, taken the height as an example,
acquiring a height h1 of the actual target B; acquiring the height h ref of the reference target in the reference image which has the reference distance d ref from the monocular camera A; and calculating the second distance d2:
The method for calculation of the second distance d2 by adopting the width ratio w_ref/w1 is similar to the method by adopting the height ratio, which will not be repeated herein.
The ranging method in step S100 is denoted as measurement method_1, and the ranging method in step S200 is denoted as measurement method_2, both methods have their own advantages and disadvantages, which are listed in Table 1.
In an embodiment of the present application, the measurement method_1 based on the geometric relationship can be realized by further adopting a range window algorithm (RWA). The RWM may further include a single window implementation manner and a multi-window implementation manner, which are specifically introduced as follows:
1) Single Window
An imaginary window is set at a distance in a range of interest of the monocular camera, the imaginary window has a predetermined physical size, and comprises all or part of the target and a bottom of the imaginary window touches a real ground surface. In such condition, a distance from the monocular camera to the imaginary window is defined as the first distance.
2) Multi-Window
A plurality of imaginary windows are set in a range of interest of the monocular camera, in which, each of the imaginary windows has a different distance from the monocular camera and has the same physical size and comprise all or part of the target. Taken the target being a vehicle as an example,
Thus, in each image corresponding to each imaginary window, a height ratio between a target height and a distance from a target bottom to each window bottom is evaluated. Each imaginary window is scored according to an evaluation result, an imaginary window having a highest score is selected, and a distance between the imaginary window having a highest score and the monocular camera is taken as the first distance.
In a preferred embodiment, the step of evaluating a height ratio between a target height and a distance from a target bottom to each window bottom may comprise: calculating the height ratio between the target height and the distance from the target bottom to each window bottom according to the physical size of each imaginary window, and evaluating the height ratio between the target height and the distance from the target bottom to each window bottom according to a calculation result. For example, the physical size of the imaginary window is 4 m×2 m, the height ratio between the target height and the distance from the target bottom to each window bottom can be calculated according to the window size. It should be noted that, in other embodiments, in addition to the height, those items required to be evaluated may also include other metrics, such as a width, a side length, and an inner/outer texture, and the like of the actual target. In the existing technology, the target is generally represented by pixels, and one target may be represented by multiple pixels, but in embodiments of the present application, the target is represented based on pixel metrics, including height, width, and the like used herein.
Specifically,
In step S110, a monochromatic image is obtained from the monocular camera.
In step S120, a ternary image corresponding to the monochrome image is obtained through horizontal differential processing and thresholding.
Line segments in a ternary image are created by connecting edge points.
In step S130, a binary image corresponding to the monochrome image is obtained through a threshold. For example, a maximum between-class variance (OTSU) can be used to create binary image. This binary image is used to separate the target from the background. This aids in creating targets from line segments. The binary image can suppress ghost targets between two targets.
In step S140, targets are created using the line segments of the ternary image and the binary image, and scores are given to the targets in a single frame.
In step S150, the optimal target of the window is selected according to the score.
In step S160, the above process is repeated for each window having different distance, and the optimal target of each window is determined.
In step S170, an optimal distance is selected, and an optimal target of the window corresponding to the optimal distance has a highest score.
In step S180, the distance of the optimal distance is taken as the first distance.
In an embodiment of the present application, the measurement method_2 may be used to support the measurement method_1, and a technical solution for combining the two methods to achieve accurate ranging will be further described below.
In step S300, credibilities of the first distance and the second distance are evaluated, and weight values assigned to the first distance and the second distance are determined, respectively.
The higher the credibilities are, the higher the weight values are.
Regarding the first distance, taken the above RWA as an example, h2/h1 reflects the credibility thereof. Theoretically, the actual target is in a state of contacting the ground surface, thus, h2 should be 0, and hl should be close to the real height, then h2/h1 should be close to 0. The closer h2/h1 is to 0, the higher the corresponding reliability is.
Regarding the second distance, the evaluation of the credibility of the second distance comprises: determining a ratio between s1 and s_ref, that is, s_ref/s1, in which, the closer the ratio_s_ref/s1 is to 1, the higher the reliability of the corresponding second distance is. Taken Table 2 as an example, h_ref/h1 reflects the credibility. h_ref is the height of the reference target in the reference image at a reference distance from the monocular camera, and can be determined by calculation or actual image measurement. For example, the height h_ref can be provided by placing the target at a distance of 100 m for photographing, and the 100 m will be the reference distance d_ref. Theoretically, h_ref should be consistent with h1, and h_ref/h1 should be close to 1. The closer h_ref/h1 is to 1, the higher the corresponding credibility is.
In step S400, a final distance between the target and the monocular camera is calculated, based on the first distance, the second distance, and the weight values respectively corresponding to the first distance and the second distance.
In particular, the final distance is calculated by adopting the following formula:
in which, range_w_r represents the final distance, d1 represents the first distance, pint win represents a weight value corresponding to the first distance, d2 represents the second distance, point_ref represents a weight value corresponding to the second distance. That is, the weighted sum of the ranging results obtained by the measurement method_1 and the measurement method_2 respectively is obtained by the above formula, thereby improving the ranging accuracy.
In an optional embodiment, when adopting multiple imaginary windows, the final distance range_w_r that has a highest score based on the first weight value point_win and the second weight value point_ref are selected as the final distance, in which, the score is a sum of the first weight value point_win and the second weight value point_ref, or a function of the first weight value point_win and the second weight value point_ref.
The effects that can be obtained by adopting the method of the embodiment of the present application are specifically described by way of examples herein. In this example,
For this example, based on the selection of weight value as shown in
That is, a window that has the highest weight value is selected as the optimal window, to determine range_w_r(i), and in such condition, i corresponds to the number of the optimal window.
In which, range new represents a finally calculated distance of the actual target relative to the monocular camera. Corresponding to the data in
To sum up, the monocular vision ranging method according to embodiments of the present application combines two ranging schemes of the first distance and the second distance (that is, based on the geometric relationship and based on image matching) to jointly determine the final distance, which significantly improves the ranging reliability of the monocular camera, and relative to the single-distance approach, if one ranging scheme is not available, another ranging scheme can be used in some cases. Therefore, the monocular vision ranging method according to embodiments of the present application can achieve better, more stable, and wider target detection.
Another embodiment of the present application further provides a machine-readable storage medium. The machine-readable storage medium is stored with instructions configured to cause a machine to execute the above-mentioned monocular vision ranging method. The machine-readable storage medium includes but is not limited to phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memories (RAM), only read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory (Flash Memory) or other memory technologies, compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tapes, magnetic tape-disc storage or other magnetic storage devices, and various other media that can store program code.
Another embodiment of the present application also provides a monocular camera. The monocular camera includes: one or more processors; and a memory for storing one or more programs, which, when being executed by the one or more processors, causes the one or more processor to implement the above-mentioned monocular vision ranging method.
An embodiment of the present application further provides a processor, configured for running a program. When the program is runed, the above-mentioned monocular vision ranging method is executed.
The memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium. The processor may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), multiple microprocessors, one or more microprocessors associated with a DSP core, a controller, a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) circuit, any other type of integrated circuit (IC), state machine, etc.
The present application also provides a computer program product (non-transitory computer readable storage medium having instructions, which when executed by a processor, perform actions), which, when being executed on a vehicle, is adapted to execute a program initialized with the steps of the above-described monocular vision ranging method.
As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product implemented in one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes therein.
The present application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowcharts and/or block diagrams, and combinations of processes and/or blocks in the flowcharts and/or block diagrams, can be implemented by computer program instructions.
These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, so as to produce a means configured for implementing functions specified in one or more processes in each flowchart and/or one or more blocks in each block diagram by instructions executed by processors of the computer or other programmable data processing device.
These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction apparatus. The instruction apparatus implements the functions specified in one or more processes of the flowcharts and/or one or more blocks of the block diagrams.
These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, such that the instructions executed on the computer or other programmable data processing device provide steps for implementing the functions specified in one or more processes of the flowcharts and/or one or more blocks of the block diagrams.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memories.
Although the invention has been illustrated and described in greater detail with reference to the preferred exemplary embodiment, the invention is not limited to the examples disclosed, and further variations can be inferred by a person skilled in the art, without departing from the scope of protection of the invention.
For the sake of clarity, it is to be understood that the use of “a” or “an” throughout this application does not exclude a plurality, and “comprising” does not exclude other steps or elements.
Number | Date | Country | Kind |
---|---|---|---|
201910772025.6 | Aug 2019 | CN | national |
This application claims priority to PCT Application No. PCT/CN2020/110572, having a filing date of Aug. 21, 2020, which claims priority to Chinese Application No. 201910772025.6, having a filing date of Aug. 21, 2019, the entire contents both of which are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/110572 | 8/21/2020 | WO |