The present application is filed pursuant to 35 U.S.C. 371 as a U.S. National Phase application of International Patent Application No. PCT/FR2012/053066, which was filed Dec. 21, 2012, claiming the benefit of priority to French Patent Application No. 1162137, which was filed on Dec. 21, 2011. The entire text of the aforementioned applications is incorporated herein by reference in its entirety.
The present invention relates to methods for estimating optical flow in imaging techniques.
Optical flow is an approximation of the motion in a sequence of images varying over time. The first work on optical flows was done by engineers in the field of television and by persons interested in the modeling of biological vision. Since then, these techniques have found their place in a wide variety of disciplines, including computer vision and robot navigation. In particular, they are used to perform motion detection, object segmentation, calculations of collision time, motion-compensated encoding etc.
Optical flow is a visual measurement notoriously affected by noise. It is currently expressed as a velocity map within the image sequence. But estimating such map assumes the solving of an ill-defined problem, i.e. including too many unknowns in relation to the number of equations. As a consequence, to estimate flow vectors, additional hypotheses and constraints must be applied. However, these hypotheses and constraints are not always valid. Furthermore, the inevitable presence of stochastic noise in unfiltered natural image sequences gives rise to various difficulties connected to its use in the control loop of a mobile robot.
Optical flow techniques can be divided into four categories (cf. J. L. Barron, et al., “Performance of Optical Flow Techniques”, International Journal of Computer Vision, Vol. 12, No. 1, pp. 43-77):
The majority of work done on the design, comparison and application of optical flow techniques concentrates on correlation- or gradient-based approaches. However, all these methods suffer intrinsically from slowness of execution, so that they are poorly adapted to real-time execution constraints, which can exist in a certain number of applications.
Another motion detection solution relies on a visual sensor known as EMD (Elementary Motion Detector). EMDs are based on motion detection models reproducing supposed vision mechanisms of insects. Two adjacent photoreceptors are used to supply image signals that are then supplied to a bank of time-based high-pass and low-pass filters. The high-pass filters remove the continuous component of the illumination which does not carry any motion information. Then the signal is subdivided between two channels, only one of which includes a low-pass filter. The delay applied by the low-pass filter is employed to supply a delayed image signal which is then correlated with that of the adjacent non-delayed channel. Finally, a subtraction between the two channels supplies a response having sensitivity to the direction of motion, which can therefore be employed to measure visual motion. Motion detection by an EMD is sensitive to image contrast, the amplitude of the detected motion being larger when there is a high contrast. This disturbs the precision of the measurement of visual motions. Due to this lack of precision, EMDs are not suitable for general navigation applications, especially for tasks requiring fine motion control.
Unlike conventional cameras that record successive images at predefined instants of sampling, biological retinas only transmit a small amount of redundant information about the scene to be visualized, and do so in an asynchronous way. Event-based asynchronous vision sensors deliver compressed digital data in the form of events. A general presentation of such sensors can be consulted in “Activity-Driven, Event-Based Vision Sensors”, T. Delbrück, et al., Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2426-2429. Event-based vision sensors have the advantage of removing redundancy, reducing latency time and increasing the dynamic range compared to conventional cameras.
The output of such a vision sensor can consist, for each pixel address, in a sequence of asynchronous events representing changes in scene reflectance at the moment they occur. Each pixel of the sensor is independent and detects changes in intensity above a threshold since the emission of the last event (for example a contrast of 15% on the logarithm of the intensity). When the intensity change exceeds the fixed threshold, an ON or OFF event is generated by the pixel according to whether the intensity is increasing or decreasing. Since the sensor is not sampled on a clock like a conventional camera, it can take into account the sequencing of the events with a very high degree of temporal precision (for example in the order of 1 μs). If such a sensor is used to reconstruct an image sequence, an image rate of several kilohertz can be attained, as opposed to a few tens of hertz for conventional cameras.
If event-based vision sensors have promising prospects, to this day no practical method exists that is well adapted to determining optical flow on the basis of signals delivered by such sensors. In “Frame-free dynamic digital vision”, Proceedings of the International Conference on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, University of Tokyo, 6-7 Mar. 2008, pp. 21-26, T. Delbrück suggests the use of “labelers” to give additional significance to the detected events, such as contour orientations or directions of motion, without however supplying any information that might make it possible to envision the estimation of an optical flow.
In the article “Asynchronous frameless event-based optical flow” which appeared in March 2012 in the “Neural Networks” periodical, Vol. 27, pp. 32-37, R. Benosman et al. describe the estimation of optical flows on the basis of events detected by an asynchronous sensor. The algorithm used is gradient-based and relies on the solving of a system of equations in which the spatial gradients at a given pixel of coordinates (x, y) are estimated by the difference between the events having arisen at this pixel (x, y) and those having arisen at the pixels of coordinates (x−1, y) and (x, y−1) at the same instant.
A need exists for a method for estimating optical flow that makes it possible to make estimations faster than the known practice for conventional cameras. There is also a need for a method for estimating optical flow on the basis of signals output by an event-based vision sensor, in order to be able to use various techniques and applications that have been developed relying on the employment of optical flows.
A method for estimating optical flow is proposed, comprising:
The method makes use of the fact that the time of occurrence of the latest event at a given pixel position is an increasing monotonic function. This function defines, for the various positions of pixel, a surface whose local variations provide information on the velocity field in the scene as the sensor sees it. The quantization of these variations can be carried out very quickly, with a delay in the order of the response time of the pixels of the sensor, for example below or in the order of the millisecond.
In an embodiment of the method, the quantization of the variations in the times of occurrence of the events of the selected set as a function of the positions of the pixels from which these events originate comprises estimating a slope that the events of the selected set exhibit in a space-time representation around the place of estimation. The selected set of events can previously have been subjected to a smoothing operation in space-time representation, in order to attenuate certain noise effects.
When the pixel matrix is two-dimensional, one way of quantizing the variations in the times of occurrence of the events of the selected set as a function of the positions of the pixels comprises determining a plane exhibiting a minimum distance with respect to the events of the selected set in the space-time representation. In the case where the pixel matrix is one-dimensional, it is a straight line and not a plane that is to be determined in space-time representation.
To enrich the spatio-temporal analysis of the events detected by the pixels of the matrix, one possibility is to include in the quantization of the variations an estimation of second-order derivatives that the events of the selected set exhibit in space-time representation around the place of estimation.
In an embodiment, for each pixel of the matrix, the time of occurrence of the most recent event originating from this pixel is memorized. The selection of the set of events for a place of estimation and an estimation time can then comprise the inclusion in this set of each event having a time of occurrence in the time interval defined with respect to the estimation time and memorized for a pixel of the spatial neighborhood of the place of estimation. This type of method allows a simple and fast implementation of the method.
In an embodiment, the steps of selecting a set of events and of quantizing the variations in the times of occurrence are executed by taking as place of estimation the position of a pixel where an event is detected and as estimation time an instant of detection of this event. It is thus possible to reach a very high speed of estimation of the optical flow in the relevant regions of the pixel matrix where activity is observed.
In particular, the selection of the set of events can be carried out in response to detection of an event originating from a pixel of the matrix at an instant of detection. In order to filter out events that in all likelihood represent only noise, it is then possible to proceed with the quantization of the variations in the times of occurrence to update motion information on condition that the set of events contains a number of events above a threshold.
Another aspect of the present invention relates to a device for estimating optical flow, comprising:
Other features and advantages of the present invention will become apparent in the description below of a non-limiting exemplary embodiment, with reference to the appended drawings, wherein:
The device for estimating optical flow represented in
A computer 20 processes the asynchronous information f output by the sensor 10, i.e. the sequences of events received asynchronously from the various pixels, to extract therefrom the information V on the optical flow observed in the scene. The computer 20 operates on digital signals. It can be implemented by programming an appropriate processor. A hardware implementation of the computer 20 using specialized logic circuits (ASIC, FPGA etc.) is also possible.
For each pixel of the matrix, the sensor 10 generates an event-based asynchronous signal sequence on the basis of the light variations sensed by the pixel in the scene appearing in the field of vision of the sensor. Such an asynchronous photosensitive sensor makes it possible in certain cases to approach the physiological response of a retina. It is then known by the acronym DVS (Dynamic Vision Sensor).
The principle of acquisition by this asynchronous sensor is illustrated by
The activation threshold Q can be fixed, as in the case of
By way of example, the DVS 10 can be of the kind described in “A 128×128 120 dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor”, P. Lichtsteiner, et al., IEEE Journal of Solid-State Circuits, Vol. 43, No. 2, February 2008, pp. 566-576, or in the patent application US 2008/0135731 A1. The dynamic of a retina (minimum duration between action potentials) in the order of a few milliseconds can be suitably reproduced with a DVS of this type. The dynamic performance is in any case largely above that which can be reached with a conventional video camera with a realistic sampling frequency.
It should be noted that that the shape of the asynchronous signal delivered for a pixel by the DVS 10, which constitutes the input signal of the computer 20, can be different from a succession of Dirac peaks, the events represented being able to have a temporal width or an amplitude or a waveform of any kind in this event-based asynchronous signal.
Note that the method proposed here is applicable to other types of DVS, and also to light sensors, the output signals of which are generated according to an address-event representation without necessarily seeking to reproduce the behavior of the retina.
In practice, the ON or OFF events sent by the pixels of the sensor 10 do not have the temporal regularity represented in the idealized schema in
The slope can also be estimated quickly by convolution of the times of occurrence of the events of the set Sp,t with a spatial differentiation kernel. In such a case, it can be desirable to attenuate the effects of the noise by applying, before the convolution kernel, a smoothing operation to the set of events Sp,t in the space-time representation. The smoothing can notably be carried out by applying a median filter.
In imaging applications, the pixel matrix of the sensor 10 is more often two-dimensional than one-dimensional. The space-time representation in which the ON or OFF events originating from the pixels can be placed is then a representation in three dimensions such as that presented in
by the motion of a rotating bar with a constant angular velocity as schematized in the box A. The majority of these points are distributed in proximity to a surface of general helicoidal shape. However, the points are not exactly aligned on this surface given the chaotic behavior mentioned above. Furthermore, the figure shows a certain number of events away from the helicoidal surface that are measured while not corresponding to the effective motion of the bar. These events are acquisition noise.
More generally, in the presence of motion of one or more objects in the field of vision of the sensor 10, events appear in three-dimensional representation (x, y, t), and we look for the optical flow corresponding to this motion.
An event arising at time t at a pixel situated at the site
is denoted e(p, t). The value of e(p, t) is +1 or −1 according to the ON (positive change in contrast) or OFF (negative change in contrast) polarity of the event. Once again, a spatial neighborhood πp of the pixel p: πp={p′/∥p′−p∥≦R} and a time interval θ=[t-T, t] can be defined, and one may consider the events originating from pixels in the neighborhood πp in the time interval θ, by retaining for each pixel only the most recent event (if there is an event during the interval θ). In this way a set of events Sp,t is constructed that can be seen as a portion, lying within the volume πp×θ, of a surface Σe in the space-time representation of the events.
For each pixel p of the matrix, the time of occurrence of the last observed event is memorized. It is then possible to define the function that allocates to each position p the time of occurrence Σe(p) of the most recent event at this position. The surface Σe is a representation of this function in three-dimensional space. It is a surface that is ascending as a function of time. The points of this surface Σe whose projections in the plane of the pixel matrix are outside the spatial neighborhood πp and those whose projection on the time axis is outside the interval θ are eliminated, when selecting the set of events Sp,t.
The computer 20 then estimates the partial derivatives of the surface Σe at the point e(p, t) with respect to the two spatial parameters x and y:
These partial derivatives indicate the slope that the events of Sp,t exhibit around the place of estimation. Around e(p, t), i.e. in the portion of surface that Sp,t represents, Σe can be written:
The partial derivatives of Σe are functions of a single variable x or y. Time being a strictly increasing function, Σe is a surface of non-zero derivatives at each point. The inverse function theorem can then be used to write, around a position
where Σe|x=x
In a manner analogous to the preceding case, the quantization of the variations of the times of occurrence of the events can include determining a plane having a minimum distance, in the sense of least squares, with respect to the events of the set Sp,t in space-time representation. The components of the gradient ∇Σe can also be estimated by convolution with differentiation kernels over x and y, where applicable after a smoothing operation.
To complete the information of the optical flow, the computer 20 can further conduct an estimation of second-order derivatives Σe around the point e(p, t). This information on second-order derivatives accounts for the accelerations observable in the field of vision of the sensor 10. Second-order derivatives
represent the local curvature of the surface Σe which supplies a measurement of the apparent frequency of an event. If the curvature is zero, the event occurs at a fixed rate in the focal plane. Increases or decreases in the curvature relate to the accelerations of the edges generating events in the scene.
In the preceding example the set Sp,t of events selected for a given place of estimation p and estimation time t is composed of events of any polarity (ON or OFF). The time of occurrence of the most recent event, ON or OFF, originating from each pixel is memorized in relation with the position of this pixel, which makes it possible to always include in the sets Sp,t only the latest events seen by the pixels of the matrix. Cases can occur where Sp,t includes events of different polarities. These cases give rise to a few errors, relatively rare, in the estimated slopes, which do not strongly disturb the measurements.
To reduce the number of these cases, one possibility is to include in the sets of events Sp,t only events of the same polarity. In particular, it is possible to memorize two tables for the various pixels of the matrix, one containing the time of occurrence of the most recent ON events, the other containing the times of occurrence of the most recent OFF events. In such an embodiment, the reception at a time t of an event having a given polarity, ON or OFF, at a pixel of position p causes the construction of a set Sp,t composed of each event having this polarity whose time of occurrence lies in the interval θ=[t−T, t] and memorized for a pixel of the neighborhood πp. The computer can then conduct an estimation of the first and/or second-order derivatives in the set Sp,t thus formed.
The moments at which the computer 20 performs quantization of the variations of the times of occurrence of events around a given pixel position can be chosen as a function of the arrival of the events on this pixel.
For example, on receiving an event e(p, t) originating from a pixel of position p in the matrix, at an instant of detection t, the computer 20 updates the table where the times of occurrence of the most recent events are stored (by replacing by t the preceding value of the table at the position p), then determines if the set Sp,t includes enough recent events to be able to perform a reasonable estimation of the motion. To do this, the computer 20 can count the times of occurrence, memorized in the table for the pixel positions in the spatial neighborhood πp, that fall within the time interval θ. If the number of times of occurrence thus determined is below a predefined threshold α, no quantization of the variations in the times of occurrence is carried out, the event that has just been detected being considered as noise. On the other hand, if the threshold α is exceeded, it is estimated that the surface Σe contains enough points close to e(p, t) to be able to perform a slope estimation in order to quantize the variations in the times of occurrence. The threshold a is typically defined as a proportion (for example 10 to 30%) of the number of pixels that the sensor includes in the spatial neighborhood πp.
In the preceding exemplary embodiments, two parameters R and T are to be adjusted to conduct optical flow estimations. The choice of these parameters depends on the physical characteristics of the sensor (spacing between pixels, response time) and on the order of magnitude of the dimensions and velocities, in the image plane, of the objects whose motion one wishes to detect. By way of example, and without this being in any way limiting, the spatial neighborhood can be dimensioned with a radius R in the order of 3 to 5 pixels in the plane of the matrix, and the duration T of the time interval θ can be in the order of 500 μs to 2 ms. This choice depends on the hardware used and the application.
The exemplary embodiments described above are illustrations of the present invention. Various modifications can be made to them without departing from the scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
11 62137 | Dec 2011 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FR2012/053066 | 12/21/2012 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2013/093378 | 6/27/2013 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
8509489 | Chen | Aug 2013 | B2 |
8526681 | Akutsu | Sep 2013 | B2 |
8665333 | Sharma et al. | Mar 2014 | B1 |
20020085092 | Choi et al. | Jul 2002 | A1 |
20070280507 | Murali | Dec 2007 | A1 |
20130335595 | Lee et al. | Dec 2013 | A1 |
20140037139 | Lee et al. | Feb 2014 | A1 |
20140286537 | Seki et al. | Sep 2014 | A1 |
20140363049 | Benosman et al. | Dec 2014 | A1 |
Entry |
---|
Benosman et al., “Asynchronous frameless event-based optical flow,” Neural Networks, 27:32-37 (2011) XP055035741. |
Bartolozzi et al., “Attentive motion sensor for mobile robotic applications,” Circuits and Systems (ISCAS), pp. 2813-2816 (2011) XP032109722. |
Delbruck, “Frame-free dynamic digital vision,” Proc. Int. Symp. on Secure-Life Electronics, Advanced Electronics for Quality Life and Society, Univ. of Tokyo, Mar. 6-7, pp. 21-26 (2008) XP055035734. |
Lichtsteiner et al., “A 128×128 120 dB 15 us Latency Asynchronous Temporal Contrast Vision Sensor,” IEEE J. Solid-State Circuits, 43(2):566-576 (2008) XP011200748. |
Benosman et al., “Asynchronous Event-Based Hebbian Epipolar Geometry,” IEEE Transactions on Neural Networks, 22(11):1723-1734 (2011) XP011411486. |
Barron et al., “Performance of Optical Flow Techniques,” IJCV, 12(1):43-77 (1994). |
Delbruck et al., “Activity-Driven, Event-Based Vision Sensors,” Proc. 2010 IEEE Int. Symp. Circuits and Systems (ISCAS), pp. 2426-2429 (2010). |
Number | Date | Country | |
---|---|---|---|
20140363049 A1 | Dec 2014 | US |