The invention relates to a depth estimation device and method thereof, and more particularly, to an adaptive and efficient depth estimation device and method thereof capable of estimating depth information of target object without additional expensive depth camera or lidar equipment.
Depth of an object is the distance of the object in the scene to the camera. However, a common vehicle-mounted RGB camera cannot capture the object's depth information, such that image information captured by the common RGB camera would not be sufficient to estimate the depth of the object in the scene or the relative position between the object and the vehicle, thus limiting the development of higher level vehicle services, such as Advanced Driver Assistance Systems (ADAS). Furthermore, current vehicle-mounted device usually estimates the depth of the object in the scene or the relative position between the object and the vehicle by using a depth camera or a light detection and ranging (lidar) device, thereby resulting in high production cost and also increasing the complexity and time for manufacturing and assembly. Thus, there is a need for improvement.
It is therefore an objective of the present invention to provide an adaptive and efficient depth estimation device and method thereof to solve the abovementioned problem.
An embodiment of the present invention discloses an adaptive and efficient depth estimation device, applied for a vehicle, comprising: a speed detection circuit, disposed on the vehicle and configured to detect a speed of the vehicle; an image capturing module, disposed on the vehicle and configured to capture a first image at a first time point and a second image at a second time point; an object detection circuit, coupled to the image capturing module and configured to determine a first position of a target object in the first image and a second position of the target object in the second image; and an image projection processing circuit, coupled to the object detection circuit and the speed detection circuit, and configured to convert the first position and the second position to a first projected position and a second projected position by using a transform matrix, and configured to determine a first relative position of the target object at the first time point and a second relative position of the target object at the second time point in a world coordinate according to the speed of the vehicle, the first projected position and the second projected position.
An embodiment of the present invention discloses an adaptive and efficient depth estimation method, applied for a vehicle, comprising: detecting a speed of the vehicle; capturing a first image at a first time point and a second image at a second time point; determining a first position of a target object in the first image and a second position of the target object in the second image; and converting the first position and the second position to a first projected position and a second projected position by using a transform matrix; and determining a first relative position of the target object at the first time point and a second relative position of the target object at the second time point in a world coordinate according to the speed of the vehicle, the first projected position and the second projected position.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Certain terms are used throughout the description and following claims to refer to particular components. As one skilled in the art will appreciate, hardware manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms “include” and “comprise” are utilized in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is coupled to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
Please refer to
For an illustration of the operations of the depth estimation device 10, please refer to
The procedure 2 includes the following steps:
Step S200: Start.
Step S202: Detect speed of vehicle and capture first image at first time point and second image at second time point.
Step S204: Determine first position of target object in first image and second position of target object in second image.
Step S206: Convert first position and second position to first projected position and second projected position by using transform matrix.
Step S208: Determine first relative position of target object at first time point and second relative position of target object at second time point in world coordinate according to speed of vehicle, first projected position and second projected position.
Step S210: End.
According to the procedure 2, in Step S202, the speed detection circuit 104 is configured to detect the speed of the vehicle. The image capturing module 102 is configured to capture continuous image frames progressively at a constant sampling rate. For example, the image capturing module 102 may capture a first image at a first time point and capture a second image at a second time point after the first time point. Speed information of the vehicle detected by the speed detection circuit 104 and images captured by the image capturing module 102 may be stored into the storage device 114.
In Step S204, the object detection circuit 106 receives the first image and the second image captured by the image capturing module 102, and identifies the target object in the first image and the second image. Further, the object detection circuit 106 determines a first position of the target object in the first image and a second position of the target object in the second image. The target object may include the object seen by the image capturing module 102. In an embodiment, the target object may be an object related to traffic rules and/or driving behavior, such as traffic lights (e.g., stoplights), traffic signs (e.g., stop signs), surrounding buildings, pedestrians, or other vehicles. The object detection circuit 106 may detect the absolute position of the object of interest in each image frame. For example, the object detection circuit 106 may detect the target object in the image frames by using a deep neural network model and determine movement information of the target object in the image frames by using the optical flow algorithm. For example, the object detection circuit 106 may determine a position p1 of the target object in a first image captured at a time point t1 and determine a position p2 of the target object in a second image captured at a time point t2. For each image, the object detection circuit 106 may process the target object of interest represented by label, bounding box and classification score, and refine the box position of the bounding box and track ID assignment by using multiple object tracking algorithm. For each image, the object detection circuit 106 may output list information of the tracked object. The list information of the tracked object may include a set of attributes associated with the tracked object, such as label, bounding box, detection score and tracking ID of the tracked object.
In Step S206, the image projection processing circuit 108 is configured to convert the first position into a first projected position and convert a second projected position into a second projected position by using a transform matrix. The transform matrix may be a homography transform matrix. The transform matrix represents a transformation which maps an image plane to another plane. The said another plane may be a ground plane, a sky plane or any other horizontal reference plane. For example, in Step S204, the object detection circuit 106 determines the position p1 of the target object in the first image captured at the time point t1 and determine the position p2 of the target object in the second image captured at the time point t2. Further, in Step S206, the image projection processing circuit 108 may convert the position p1 of the target object in the first image captured at the time point t1 from the image plane to the horizontal reference plane by using the transform matrix so as to generate a projected position q1. The image projection processing circuit 108 may convert the position p2 of the target object in the second image captured at the time point t2 from the image plane to the horizontal reference plane by using the transform matrix so as to generate a projected position q2. For example, suppose a unit sky plane is one meter above the image capturing module 102 and parallel to ground, and a camera height of the image capturing module 102 is above the ground. In an embodiment, homogenous coordinates of the positions p1 and p2 in an image plane may be expressed as follows:
Projected coordinates of the projected positions q1 and q2 in a unit sky plane may be expressed as follows:
In an embodiment, the projected positions q1 and q2 may be calculated by the image projection processing circuit 108 according to the following equations:
where q1 represents the first projected position; q2 represents the second projected position; Hp→w
On the other hand, in Step S206, the image correction circuit 110 may optimize the transform matrix according to at least one of a fish-eye undistortion parameter and a camera un-tilt parameter. The image correction circuit 110 may correct image distortion that is caused by optical component design by using the fish-eye undistortion parameter for optimizing the transform matrix. The image correction circuit 110 may correct image distortion that is caused by the component installation error and misalignment of the image capture module 102 by using the camera un-tilt parameter for optimizing the transform matrix.
In Step S208, the image projection processing circuit 108 is configured to determine a first relative position of the target object at the first time point and a second relative position of the target object at the second time point in a world coordinate according to the speed of the vehicle detected in Step S202, the first projected position and the second projected position determined in Step S206. For example, the image projection processing circuit 108 may calculate a scalar coefficient according to the speed of the vehicle, a time difference between the first time point and the second time point and a difference value between a coordinate value of the first projected position along a first coordinate axis (e.g., along x-axis) and a coordinate value of the second projected position along the first coordinate axis, and accordingly calculate the first relative position of the target object at the first time point and the second relative position of the target object at the second time point in the world coordinate. In an embodiment, the scalar coefficient may be calculated by the image projection processing circuit 108 according to the following equations:
Moreover, in Step S208, the image projection processing circuit 108 may calculate a first relative position of the target object at the first time point t1 and a second relative position of the target object at the second time point t2 in the world coordinate according to the scalar coefficient k, coordinate values of the projected positions q1 and q2 in the first coordinate axis (e.g., x-axis) and the second coordinate axis (e.g., y-axis) and a camera height of the image capturing module 102. The camera height of the image capturing module 102 may be the vertical distance from the installation position of the image capturing module 102 to the reference plane (e.g., ground plane) ground plane. As such, in embodiments of the present invention, the reprojected position may be scaled according to the ratio of the vehicle speed to the trajectory length, and the relative position may be calculated based on the reference plane with the same height. The first relative position of the target object at the first time point in the world coordinate may be the relative position of the target object with respect to the image capture module 102 at the first time point. The second relative position of the target object at the second time point in the world coordinate may be the relative position of the target object with respect to the image capture module 102 at the second time point. In an embodiment, the first relative position of the target object at the first time point and the second relative position of the target object at the second time point in the world coordinate may be calculated by the image projection processing circuit 108 according to the following equations:
Moreover, in Step S208, the object positioning circuit 112 may calculate geographic location information of the target object according to the positioning information and bearing of the GPS, and the relative position information of the target object in the world coordinate calculated by the image projection processing circuit 108.
To sum up, the embodiments of the present invention may quickly and accurately calculate the relative position of the target object in the world coordinates based on the real-time speed of the vehicle and the movement information of pixel coordinate position of the target object after conversion and correction, thus realizing the depth estimation of three dimensional position of target object with respect to the image capture module 102, without requiring additional expensive depth camera or lidar device.
In the embodiments of the present invention, the speed detection circuit 102, the image capturing module 104, the object detection circuit 106 and the image projection processing circuit 108 may be utilized to determine the relative position of the target object during an on line execution phase. The image correction circuit 110 may be utilized to generate the fish-eye undistortion parameter and the camera un-tilt parameter and optimize the transform matrix according to the fish-eye undistortion parameter and the camera un-tilt parameter during an off line execution phase. The speed information of the vehicle detected by the speed detection circuit 104 and images captured by the image capturing module 102 may be stored into the storage device 114. The image correction circuit 110 may read the speed information and images stored in the storage device 114 for calculate the fish-eye undistortion parameter and the camera un-tilt parameter. In addition, the image correction circuit 110 and the storage device 114 may be integrated into a cloud server.
Those skilled in the art should readily make combinations, modifications and/or alterations on the abovementioned description and examples. The abovementioned description, steps, procedures and/or processes including suggested steps can be realized by means that could be hardware, software, firmware (known as a combination of a hardware device and computer instructions and data that reside as read-only software on the hardware device), an electronic system or combination thereof. Examples of hardware can include analog, digital and/or mixed circuits known as microcircuit, microchip, or silicon chip. Examples of the electronic system may include a system on chip (SoC), system in package (SiP), a computer on module (COM), and the depth estimation device 10. Any of the above-mentioned procedures and examples above may be compiled into program codes or instructions that are stored in a storage device. The storage device may include a computer-readable storage medium. The storage device may include read-only memory (ROM), flash memory, random access memory (RAM), subscriber identity module (SIM), hard disk, floppy diskette, or CD-ROM/DVD-ROM/BD-ROM, but not limited thereto. The processing circuit may read and execute the program codes or the instructions stored in the storage device for realizing the above-mentioned functions. Each of the object detection circuit 106, the image projection processing circuit 108, the image correction circuit 110, the object positioning circuit 112 and the processing circuit may be a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a programmable controller, a graphics processing unit (GPU), a programmable logic device (PLD), an electronic control unit (ECU) or other similar devices or combination of these devices, but not limited thereto.
In summary, the embodiments of the present invention may quickly and accurately calculate the relative position of the target object in the world coordinates based on the real-time speed of the vehicle and the movement information of pixel coordinate position of the target object after conversion and correction, thus realizing the depth estimation of three dimensional position of target object with respect to the image capture module, without requiring additional expensive depth camera or lidar equipment.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.