The present disclosure generally relates to the field of image sensing and processing. More specifically, and without limitation, the disclosure relates to computer-implemented systems and methods for three-dimensional imaging and sensing. The disclosure additionally relates to three-dimensional image sensing using event-based image sensors. The image sensors and techniques disclosed herein may be used in various applications and vision systems, such as security systems, autonomous vehicles, and other systems that benefit from rapid and efficient three-dimensional sensing and detection.
Extant three-dimensional image sensing systems include those that produce depth maps of scenes. Such sensing systems have drawbacks, including low spatial and/or temporal resolution. Such three-dimensional image sensing systems also suffer from other drawbacks, including being too computationally expensive and/or having other processing limitations.
For example, time-of-flight camera systems generally measure depth directly. In such cameras, a modulated signal is emitted using a laser projector, and the distance is estimated by measuring the time shift between the emitted signal and its reflection from objects in the observed scene. Depending on the implementation, time-of-flight systems usually generate up to 60 depth images per second. However, most time-of-flight cameras have low spatial resolutions (e.g., 100,000 pixels or lower). Moreover, the use of a laser projector does not allow for time-of-flight cameras to be used in low-power applications while retaining a high range and a high spatial resolution.
Stereo cameras are based on the idea that it is possible to match points from one view to points in another view. Using the relative position of the two cameras, stereo cameras estimate the three-dimensional position of points in space. However, stereo cameras typically have limited image density, as only detected points from textured environments can be measured. Moreover, stereo cameras are computationally expensive, therefore suffering from low temporal resolution as well as being limited in use for low-power applications.
Structured light cameras function similarly to stereo cameras but use a pattern projector in lieu of a second camera. By defining the projected pattern, a structured light camera may perform triangulation without using a second camera. Structured light solutions usually have higher spatial resolutions (e.g., up to 300,000 pixels). However, structured light cameras are computationally expensive and/or generally suffer from low temporal resolution (e.g., around 30 fps). The temporal resolution may be increased but at the expense of spatial resolution. Similar to time-of-flight cameras, structured light cameras are limited in use (e.g., limited in range and spatial resolution) for low-power applications.
Active stereo image sensors combine passive stereo and structured light techniques. In particular, a projector projects a pattern, which may be recognized by two cameras. Matching the pattern in both images allows estimation of depth at matching points by triangulation. Active stereo can revert to passive stereo in situations where the pattern cannot be decoded easily, such as an outdoor environment, in a long-range mode, or the like. As a result, active stereo, like structured light techniques and stereo techniques, suffer from low temporal resolution as well as being limited in use for low-power applications.
Some structured light systems integrating an event-based camera have been developed. In these systems, a laser beam projects a single blinking dot at a given frequency. Cameras may then detect the change of contrast caused by the blinking dot, and event-based cameras can detect such changes with a very high temporal accuracy. Detecting the changes of contrast at the given frequency of the laser allows the system to discriminate events produced by the blinking dot from other events in the scene. In some implementations, the projected dot is detected by two cameras, and the depth at the point corresponding to the blinking dot is reconstructed using triangulation. In other systems developed by the applicant, Prophesee, a projector may encode patterns or symbols in dot pulses projected into the scene. An event-based image sensor may then detect the same pattern or symbol reflected from the scene and triangulate using the location from which the pattern was projected and the location at which the pattern was detected to determine a depth at a corresponding point in the scene.
When only projecting one dot at a time at a random position in the image, the temporal resolution directly decreases with the number of used dot locations. Moreover, even if a system was implemented to project a plurality of dots simultaneously, it may be necessary for the scene to be stable until the entire temporal code has been decoded. Therefore, this approach may not be able to reconstruct dynamic scenes.
Embodiments of the present disclosure provide computer-implemented systems and methods that address the aforementioned drawbacks. In this disclosure, systems and methods for three-dimensional image sensing are provided that have advantages such as being computationally efficient as well as compatible with dynamic scenes. With the present embodiments, the generated data may include depth information, allowing for three-dimensional reconstruction of a scene, e.g., as a point cloud. Additionally, embodiments of the present disclosure may be used in low-power applications, such as augmented reality, robotics, or the like, while still providing data of comparable, or even higher, quality than other higher-power solutions.
Embodiments of the present disclosure may project lines comprising patterns of electromagnetic pulses and receive reflections of those patterns at an image sensor. In some embodiments, a projector (e.g., a laser projector) may deform the projected line into a curve. Accordingly, as used throughout, a “line” may refer to a geometric line or to a curved line. Moreover, the line may comprise a plurality of dots with varying intensity, such that the line may comprise a dotted line or the like. The patterns may be indexed to spatial coordinates of the projector, and the image sensor may index the received reflections by location(s) of the pixel(s) receiving the reflections. Accordingly, embodiments of the present disclosure may triangulate depths based on the spatial coordinates of the projector and the pixel(s).
By using lines, embodiments of the present disclosure may be faster and increase density compared with dot-based approaches. Moreover, lines may require fewer control signals for a projector as compared with dots, reducing power consumption.
To account for dynamic scenes, embodiments of the present disclosure may use state machines to identify a reflected curve corresponding to a projected line. Additionally, in some embodiments, the state machines may further track received patterns temporally that move across pixels of the image sensor. Thus, a depth may be calculated even if different pixels receive different portions of a pattern. Accordingly, embodiments of the present discourse may solve technical problems presented by extant technologies, as explained above.
Embodiments of the present disclosure may also provide for higher temporal resolution. For example, latency is kept low by using triangulation of known patterns (e.g., stored patterns and/or patterns provided from a projector of the patterns to a processor performing the triangulation) rather than matching points in captured images. Moreover, the use of state machines can improve accuracy without sacrificing latency. As compared with a brute laser line sweep, embodiments of the present disclosure may reduce latency and sensitivity to jitter. Moreover, embodiments of the present disclosure may increase accuracy in distinguishing between environmental light and reflections from the projected lines.
In some embodiments, the temporal resolution may be further increased by using an event-based image sensor. Such a sensor may capture events in a scene based on changes in illuminations at pixels exceeding a threshold. Asynchronous sensors can detect patterns projected into the scene while reducing the amount of data generated. Accordingly, the temporal resolution may be increased.
Moreover, in some embodiments, the reduction in data due to the use of event-based image sensors may allow for increasing the rate of light sampling at each pixel, e.g., from 30 times per second or 60 times per second (i.e., frame rates of typical CMOS image sensors) to higher rates such as 1,000 times per second, 10,000 times per second and more. The higher rate of light sampling increases the accuracy of the pattern detection compared to extant techniques.
In one embodiment, a system for detecting three-dimensional images may comprise a projector configured to project a plurality of lines comprising electromagnetic pulses onto a scene; an image sensor comprising a plurality of pixels and configured to detect reflections in the scene caused by the projected plurality of lines; and at least one processor. The at least one processor may be configured to: detect one or more first events from the image sensor based on the detected reflections and corresponding to one or more first pixels of the image sensor; detect one or more second events from the image sensor based on the detected reflections and corresponding to one or more second pixels of the image sensor; and identify a projected line corresponding to the one or more second events and the one or more first events. Further, in some embodiments, the at least one processor may be configured to calculate three-dimensional image points based on the identified line. Still further, the at least one processor may be configured to calculate three-dimensional rays for the one or more first pixels and the one or more second pixels based on the identified line and calculate the three-dimensional image points based on the three-dimensional rays and a plane equation associated with the identified line. Additionally, or alternatively, the three-dimensional image points may be calculated using a quadratic surface equation.
In such embodiments, the at least one processor may further be configured to determine a plurality of patterns associated with the plurality of lines. Further, the one or more first events may correspond to a start of the plurality of patterns associated with the plurality of lines. Moreover, the one or more second events may correspond to an end of the plurality of patterns associated with the plurality of lines.
In any of these embodiments, the projector may be configured to project one or more dots of each line simultaneously. Alternatively, the projector may be configured to project one or more dots of each line sequentially.
In any of these embodiments, the plurality of patterns may comprise at least two different pulse lengths separated by a length in time. Additionally, or alternatively, the plurality of patterns may comprise a plurality of pulses separated by different lengths of time. Additionally, or alternatively, the plurality of patterns may comprise pulses having at least one of selected frequencies, phase shifts, or duty cycles used to encode symbols.
In any of these embodiments, the projector may be configured to project the plurality of lines to a plurality of spatial locations in the scene. Moreover, at least one of the spatial locations may correspond to a first pattern, and at least one other of the spatial locations may correspond to a second pattern.
In any of these embodiments, the projector may be configured to project one or more dots of the plurality of lines at a plurality of different projection times. Moreover, at least one of the projection times may correspond to at least one of the one or more first events, and at least one other of the projection times may correspond to at least one of the one or more second events.
In any of these embodiments, each pixel of the image sensor may comprise a detector that is electrically connected to at least one first photosensitive element and configured to generate a trigger signal when an analog signal that is a function of brightness of light impinging on the at least one first photosensitive element matches a condition. In some embodiments, at least one second photosensitive element may be provided that is configured to output a signal that is a function of brightness of light impinging on the at least one second photosensitive element in response to the trigger signal. Still further, the at least one first photosensitive element may comprise the at least one second photosensitive element. In any of these embodiments, the at least one processor may receive one or more first signals from at least one of the first photosensitive element and the second photosensitive element, wherein the one or more first signals may have positive polarity when the condition is an increasing condition and negative polarity when the condition is a decreasing condition. Accordingly, the at least one processor may be further configured to decode polarities of the one or more first signals to obtain the one or more first events or the one or more second events. Additionally, or alternatively, the at least one processor may be further configured to discard any of the one or more first signals that are separated by an amount of time larger than a threshold and/or to discard any of the one or more first signals associated with an optical bandwidth not within a predetermined range.
In any of these embodiments, the at least one first photosensitive element may comprise the at least one second photosensitive element. Consistent with some embodiments, an exposure measurement circuit may be removed such that only events from a condition detector are output by the image sensor. Accordingly, the first and second photosensitive elements may comprise a single element used only by a condition detector.
Alternatively, the at least one first photosensitive element and the at least one second photosensitive element may be, at least in part, distinct elements.
In any of these embodiments, the system may further comprise an optical filter configured to block any reflections associated with a wavelength not within a predetermined range.
In any of these embodiments, the plurality of patterns may comprise a set of unique symbols encoded in electromagnetic pulses. Alternatively, the plurality of patterns may comprise a set of quasi-unique symbols encoded in electromagnetic pulses. For example, the symbols may be unique within a geometrically defined space. In such embodiments, the geometrically defined space may comprise one of the plurality of lines.
In any of these embodiments, the at least one processor may be configured to determine the plane equation based on which pattern of the plurality of patterns is represented by the one or more first events and the one or more second events. Additionally, or alternatively, the at least one processor may be configured to determine a plurality of plane equations associated with the plurality of lines and select the line associated with the one or more first events and the one or more second events to determine the associated plane equation of the plurality of plane equations.
In any of these embodiments, the at least one processor may be configured to calculate the three-dimensional image points based on an intersection of the plurality of rays and the associated plane equation. In such embodiments, the plurality of rays may originate from the sensor and represent a set of three-dimensional points in the scene that correspond to the one or more first pixels and the one or more second pixels.
For example, the projection of a straight line into three-dimensional (3D) space corresponds to a 3D plane, whose corresponding plane equation may comprise a′X+b′Y+c′Z+d′=0 (equation 1), where X, Y, and Z are coordinates of points lying on the plane in 3D space, and a′, b′, c′, and d′ are constants defining the plane. The origin is the camera optical center at position (0, 0, 0). For a pixel (i, j) on the sensor, located in the i'th pixel row and j'th pixel column, the pixel position in 3D space can be identified using sensor calibration parameters, an (x, y, f), where f is the focal length according to a pin-hole camera model. All 3D points projecting to (i, j) on the sensor are on the 3D ray which passes through (x, y, f) and the optical center (0, 0, 0). For all 3D points on the ray, there exists a scalar constant A as defined by the following (equation 2):
To triangulate the 3D point at the intersection of the 3D plane from the projector and the 3D ray from the camera, equation 2 can be injected into equation 1 as:
a′λx+b′λy+c′λf+d′=0
which yields
In some embodiments, the projection is a curved line into 3D space. This is no longer a plane, but a curved surface. Therefore, another triangulation operation may be used as opposed to one based on the above-described plane equation. For example, a quadratic surface model may be used, of the general equation:
where Q is a 3×3 matrix, P is a three-dimensional row vector and R is a scalar constant. Triangulating a 3D point at the intersection of a 3D ray from the camera and the 3D surface is possible by injecting equation 2 into the quadratic surface equation and solving for A.
In any of these embodiments, the at least one processor may be configured to initialize one or more state machines based on the one or more first events. Still further, the at least one processor may be configured to store, in a memory or storage device, finalized state machines comprising the one or more initialized state machines and candidates for connecting the one or more first events to the one or more second events. Accordingly, the at least one processor may be further configured to use the stored state machines in determining candidates for subsequent events.
In any of these embodiments, determining candidates for connecting the one or more second events to the one or more first events may use the plurality of patterns and the one or more stored state machines. Additionally, or alternatively, the one or more second events may be timestamped after the one or more first events such that the candidates connect the one or more first events to the one or more second events temporally.
In any of these embodiments, detecting the one or more first events may comprise receiving one or more first signals from the image sensor and detecting the one or more first events based on the one or more first signals. Additionally, or alternatively, detecting the one or more first events may comprise receiving one or more first signals from the image sensor, wherein the one or more first signals encode the one or more first events.
In one embodiment, an imaging system may comprise a plurality of pixels and at least one processor. Each pixel may comprise a first photosensitive element, a detector that is electrically connected to the first photosensitive element and configured to generate a trigger signal when an analog signal that is a function of brightness of light impinging on the first photosensitive element matches a condition. Optionally, one or more second photosensitive elements may also be provided that are configured to output a signal that is a function of brightness of light impinging on the one or more second photosensitive elements. In some embodiments, the at least one processor may be configured to detect one or more first events from the one or more second photosensitive elements based on detected reflections from a scene and in response to trigger signals from the detector and corresponding to one or more first pixels of the plurality of pixels; initialize one or more state machines based on the one or more first events; detect one or more second events from the one or more second photosensitive elements based on detected reflections from the scene and in response to trigger signals from the detector and corresponding to one or more second pixels of the plurality of pixels based on the received second signals; determine one or more candidates for connecting the one or more second events to the one or more first events; and using the one or more candidates, identify a projected line corresponding to the one or more second events and the one or more first events. Further, in some embodiments, the at least one processor may be configured to calculate three-dimensional rays for the one or more first pixels and the one or more second pixels based on the identified line; and calculate three-dimensional image points for the one or more first pixels and the one or more second pixels based on the three-dimensional rays. In some embodiments, the three-dimensional image points may be additionally calculated based on a plane equation associated with a line projected onto the scene corresponding to the identified line. In other embodiments, a triangulation operation that is based on a curved line and the aforementioned quadratic surface equation may be utilized.
In such embodiments, the at least one processor may be further configured to determine a plurality of patterns associated with a plurality of lines comprising electromagnetic pulses projected onto a scene, wherein determining the plurality of patterns may comprise receiving digital signals defining amplitudes separated by time intervals. For example, the digital signals defining amplitudes separated by time intervals may be received from a controller associated with a projector configured to project a plurality of electromagnetic pulses according to the plurality of patterns. Additionally, or alternatively, the digital signals defining amplitudes separated by time intervals may be retrieved from at least one non-transitory memory storing patterns.
In any of the embodiments described above, the first photosensitive element may comprise the one or more second photosensitive elements. Further, in some embodiments, there are no second photosensitive elements.
In one embodiment, a method for detecting three-dimensional images may comprise determining a plurality of patterns corresponding to a plurality of lines comprising electromagnetic pulses emitted by a projector onto a scene; detecting, from an image sensor, one or more first events based on reflections caused by the plurality of electromagnetic pulses and corresponding to one or more first pixels of the image sensor; initializing one or more state machines based on the one or more first events; detecting, from the image sensor, one or more second events based on the reflections and corresponding to one or more second pixels of the image sensor; determining one or more candidates for connecting the one or more second events to the one or more first events; using the one or more candidates, identifying a projected line corresponding to the one or more second events and the one or more first events; calculating three-dimensional rays for the one or more first pixels and the one or more second pixels based on the identified line; and calculating three-dimensional image points for the one or more first pixels and the one or more second pixels based on the three-dimensional rays and a plane equation associated with one of the lines corresponding to the identified line.
In one embodiment, a system for detecting three-dimensional images may comprise a projector configured to project a plurality of lines comprising electromagnetic pulses onto a scene; an image sensor comprising a plurality of pixels and configured to detect reflections in the scene caused by the projected plurality of lines; and at least one processor. The at least one processor may be configured to: encode a plurality of symbols into a plurality of patterns associated with the plurality of lines, the plurality of symbols relating to at least one spatial property of the plurality of lines; command the projector to project the plurality of patterns onto the scene; detect one or more first events from the image sensor based on the detected reflections and corresponding to one or more first pixels of the image sensor; initialize one or more state machines based on the one or more first events; detect one or more second events from the image sensor based on the detected reflections and corresponding to one or more second pixels of the image sensor; determine one or more candidates for connecting the one or more second events to the one or more first events; using the one or more candidates and the one or more state machines, decode the one or more first events and the one or more second events to obtain the at least one spatial property; and calculate three-dimensional image points for the one or more first pixels and the one or more second pixels based on locations of the one or more first events and the one or more second events on the sensor and the at least one spatial property.
Additional objects and advantages of the present disclosure will be set forth in part in the following detailed description, and in part will be obvious from the description, or may be learned by practice of the present disclosure. The objects and advantages of the present disclosure will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not restrictive of the disclosed embodiments.
The accompanying drawings, which comprise a part of this specification, illustrate various embodiments and, together with the description, serve to explain the principles and features of the disclosed embodiments. In the drawings:
The disclosed embodiments relate to systems and methods for capturing three-dimensional images by sensing reflections of projected patterns of light, such as one or more line patterns. The disclosed embodiments also relate to techniques for using image sensors, such as synchronous or asynchronous image sensors, for three-dimensional imaging. Advantageously, the exemplary embodiments can provide fast and efficient three-dimensional image sensing. Embodiments of the present disclosure may be implemented and used in various applications and vision systems, such as autonomous vehicles, robotics, augmented reality, and other systems that benefit from rapid and efficient three-dimensional image detection.
Embodiments of the present disclosure may be implemented through any suitable combination of hardware, software, and/or firmware. Components and features of the present disclosure may be implemented with programmable instructions implemented by a hardware processor. In some embodiments, a non-transitory computer-readable storage medium including instructions is also provided, and the instructions may be executed by at least one processor for performing the operations and methods disclosed herein. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM or any other flash memory, NVRAM, a cache, a register, any other memory chip or cartridge, and networked versions of the same. In some embodiments, systems consistent with the present disclosure may include one or more processors (CPUs), an input/output interface, a network interface, and/or a memory. In networked arrangements, one or more servers and/or databases may be provided that are in communication with the system.
Although embodiments of the present disclosure are described herein with general reference to an imaging sensor, it will be appreciated that such a system may part of a camera, a LIDAR, or another imaging system. Moreover, although some embodiments are described in combination with a projector (such as a laser projector), it will be appreciated that such components may be separate from the image sensors and/or processors described herein.
Embodiments of the present disclosure may use state machines to connect reflections along a curve that corresponds to a line projected into a scene. Additionally, or alternatively, embodiments of the present disclosure may use state machines to track reflections across one or more pixels of an image sensor. Accordingly, state machines may describe the transformation of projected lines of light patterns into the tracked reflections and thus allow for recreation of any dynamic portions of a scene as well as static portions. State machines consistent with the present disclosure may be implemented through any suitable combination of hardware, software, and/or firmware.
As used herein, a “pattern” may refer to any combination of light pulses having one or more characteristics. For example, a pattern may comprise at least two different amplitudes separated by a length of time, at least two different wavelengths separated by a length of time, at least two different pulse lengths separated by a length of time, a plurality of pulses separated by different lengths of time, or the like. Moreover, a pattern may have at least one of frequencies, phase shifts, or duty cycles used to encode symbols (e.g., as explained below with respect to the example embodiment of
State machines, such as those depicted in
As used herein, a “pixel” refers to a smallest element of an image sensor that outputs data based on light impinging on the pixel. In some embodiments, a pixel may be larger or include more components because it may include two or more photosensitive elements, other circuitry, or the like, e.g., as depicted in
Although the present disclosure refers to a reflection caused by a projected pattern as being received at a single pixel, the projected pattern may include a sufficient number of photons in order to cover and be received by a plurality of pixels. Accordingly, the triangulation described herein may be based on an average location of the plurality of pixels and/or comprise a plurality of triangulations, including the locations of each pixel in the plurality.
As depicted in
As further depicted in
Although not depicted in
As depicted in
In the example of
In some embodiments, exposure measurement circuit 257 may include an analog-to-digital converter. Examples of such embodiments are disclosed in U.S. Provisional Patent Application No. 62/690,948, filed on Jun. 27, 2018, and titled “Image Sensor with a Plurality of Super-Pixels”; and U.S. Provisional Patent Application No. 62/780,913, filed on Dec. 17, 2018, and titled “Image Sensor with a Plurality of Super-Pixels.” The disclosures of these applications are fully incorporated herein by reference. In such embodiments, exposure measurement circuit 257 may reset condition detector 255 (e.g., using a “clear” signal not shown in
In some embodiments, exposure measurement circuit 257 may output the measurement asynchronously to a readout and control system 259. This may be performed using, e.g., an asynchronous event readout (AER) communications protocol or other suitable protocol. In other embodiments, readout from exposure measurement circuit 257 may be clocked using external control signals (e.g., labeled “control” in
Examples of pixel 250 depicted in
Although depicted as different photosensitive elements, in some embodiments, photosensitive elements 251 and 253 may comprise a single element shared between condition detector 255 and exposure measurement circuit 257. Examples of such embodiments are disclosed in European Patent Application No. 18170201.0, filed on Apr. 30, 2018, and titled “Systems and Methods for Asynchronous, Time-Based Image Sensing.” The disclosure of this application is incorporated herein by reference.
Moreover, although depicted with one condition detector and one exposure measurement circuit, some embodiments may include a plurality of exposure measurement circuits sharing a condition detector, such that a trigger signal causes a plurality of measurements to be captured. Examples of such embodiments are disclosed in U.S. Provisional Patent Application No. 62/690,948, filed on Jun. 27, 2018, and titled “Image Sensor with a Plurality of Super-Pixels”; and U.S. Provisional Patent Application No. 62/780,913, filed on Dec. 17, 2018, and titled “Image Sensor with a Plurality of Super-Pixels. The disclosures of these applications are incorporated herein by reference.
In other embodiments, the exposure measurement circuit may be removed such that only events from the condition detector are output by the image sensor. Accordingly, photosensitive elements 251 and 253 may comprise a single element used only by condition detector 255.
Although not depicted in
Projector 301 may comprise one or more laser generators or any other device configured to project lines of electromagnetic pulses according to one or more patterns. In some embodiments, projector 301 may be a dot projector. Accordingly, projector 301 may be configured to sweep along the lines while projecting dots in order to project the lines into 3-D scene 305. Alternatively, projector 301 may comprise a laser projector configured to project light forming the lines simultaneous along some or all portions of the lines.
Additionally, or alternatively, projector 301 may include a screen or other filter configured to filter light from projector 301 into the lines. Although not depicted in
In some embodiments, projector 301 may be configured to project the plurality of lines to a plurality of spatial locations in scene 305. The spatial locations may correspond to different pixels (or groups of pixels) of an image sensor 309, further described below. Additionally, or alternatively, projector 301 may be configured to project the plurality of lines at a plurality of different projection times.
In some embodiments, projector 301 may be configured to project a plurality of frequencies, e.g., in order to increase variety within patterns. In other embodiments, projector 301 may be configured to use a single frequency (or range of frequencies), e.g., in order to distinguish reflections caused by the patterns from noise in scene 305. By way of example, the frequencies may be between 50 Hz and a few kHz (e.g., 1 kHz, 2 kHz, 3 kHz, or the like).
The projected lines or other patterns may cause reflections from scene 305. In the example of
The reflections may be captured by an image sensor 309. In some embodiments, image sensor 309 may be an event-based sensor. As explained above, image sensor 309 may comprise an array of pixels 200 of
Reflections 307a, 307b, and 307c may form curves on pixels of image sensor 309 even if patterns 303a, 303b, and 303c are arranged along straight lines (as shown in
As explained above with respect to
For example, the projection of a straight line into three-dimensional (3D) space corresponds to a 3D plane, whose corresponding plane equation may comprise a′X+b′Y+c′Z+d′=0 (equation 1), where X, Y, and Z are coordinates of points lying on the plane in 3D space, and a′, b′, c′, and d′ are constants defining the plane. The origin is the camera optical center, at position (0, 0, 0). For a pixel (i, j) on the sensor, located in the i'th pixel row and j'th pixel column, the pixel position in 3D space can be identified using sensor calibration parameters, a (x, y, f), where f is the focal length according to a pin-hole camera model. All 3D points projecting to (i, j) on the sensor are on the 3D ray that passes through (x, y, f) and the optical center (0, 0, 0). For all 3D points on the ray, there exists a scalar constant A as defined by the following (equation 2):
To triangulate the 3D point at the intersection of the 3D plane from the projector and the 3D ray from the camera, equation 2 can be injected into equation 1 as:
a′λx+b′λy+c′λf+d′=0
which yields
In some embodiments, the projection is a curved line into 3D space. In such a case, this is no longer a plane, but a curved surface. Therefore, another triangulation operation may be used as opposed to one based on the above-referenced plane equation. For example, a quadratic surface model may be used, of the general equation:
where Q is a 3×3 matrix, P is a three-dimensional row vector, and R is a scalar constant. Triangulating a 3D point at the intersection of a 3D ray from the camera and the 3D surface is possible by injecting equation 2 into the quadratic surface equation and solving for A.
Consistent with some embodiments, the processor may further select the ray intersecting with plane equation 311 (ray 315 in the example of
Although not shown in
In another example,
At step 501, the at least one processor may determine a plurality of patterns associated with a plurality of lines comprising electromagnetic pulses emitted by a projector (e.g., projector 301 of
In some embodiments, the at least one processor may also send commands to the projector configured to project a plurality of electromagnetic pulses onto a scene such that the projector transmits the plurality of electromagnetic pulses according to the patterns. For example, the at least one processor may use an on-chip bus, a wire or other off-chip bus, at least one transmitter configured to communicate over at least one bus, wire, or network, or any combination thereof to send commands to the projector.
As further explained above, the patterns may comprise any series of pulses of electromagnetic radiation over a period of time. For example, a pattern may define one or more pulses by amplitude and/or length of time along the period of time of the pattern. Accordingly, the plurality of patterns may comprise at least two different amplitudes separated by a length of time, at least two different wavelengths separated by a length of time, at least two different pulse lengths separated by a length of time, a plurality of pulses separated by different lengths of time, or the like. Moreover, as described above, the pattern may have at least one of selected frequencies, phase shifts, or duty cycles used to encode symbols (see, e.g., the explanation below with respect to
In some embodiments, the at least one processor may encode a plurality of symbols into the plurality of patterns. As explained above, the plurality of patterns may be associated with the plurality of lines. The symbols may comprise letters, numbers, or any other communicative content encoded into electromagnetic patterns. In some embodiments, the plurality of symbols relating to at least one spatial property of the plurality of lines. For example, the plurality of symbols may encode an expected frequency or brightness of the electromagnetic pulses, a spatial location associated with the electromagnetic pulses (such as a spatial coordinate of the projector projecting the pulses), or the like.
Referring again to
At step 505, the at least one processor may detect one or more first events corresponding to one or more first pixels of the image sensor based on the received first signals. For example, an event may be detected based on a polarity change between two signals of the one or more first signals, changes in amplitude between two signals of the one or more first signals having magnitudes greater than one or more thresholds, or the like. As used herein, a “polarity change” may refer to a change in amplitude, either increasing or decreasing, detected in the one or more first signals. In embodiments using an event-based image sensor such as image sensor 250 of
In some embodiments, the at least one processor may associate the one or more first events with the one or more first pixels based on addresses encoded with the one or more first signals by the image sensor. For example, the image sensor (or a readout system in communication with the image sensor) may encode an address of the pixel(s) from which the one or more first signals originated. Accordingly, the at least one processor may associate the one or more first events with the one or more first pixels based on addresses encoded with the one or more first signals. In such embodiments, the at least one processor is adapted to decode and obtain the address from the one or more first signals.
At step 507, the at least one processor may initialize one or more state machines based on the one or more first events. For example, the at least one processor may initialize a state machine for the one or more first pixels. Additionally, in some embodiments, the at least one processor may initialize a state machine for neighboring pixels. As explained below, with respect to
At step 509, the at least one processor may receive, using the image sensor, one or more second signals based on the reflections. For example, the at least one processor may receive the one or more second signals from image sensor 200 of
At step 511, the at least one processor may detect one or more second events corresponding to one or more second pixels of the image sensor based on the received second signals. For example, the at least one processor may detect the one or more second events based on a polarity change between two signals of the one or more second signals, changes in amplitude between two signals of the one or more second signals having magnitudes greater than one or more thresholds, or the like. In embodiments using an event-based image sensor such as image sensor 250 of
At step 513, the at least one processor may determine candidates for connecting the one or more second events to the one or more first events. For example, as explained below with respect to
As depicted in
Referring again to the example of
At step 515, the at least one processor may use the candidates to identify a curved formed by the one or more second events and the one or more first events. For example, as explained above with respect to
Step 515 may further include calculating three-dimensional rays for the one or more first pixels and the one or more second pixels based on the identified curve. For example, as depicted in
As part of step 515, the at least one processor may also calculate three-dimensional image points for the one or more first pixels and the one or more second pixels based on the three-dimensional rays and a plane equation associated with one of the lines corresponding to the identified curve. For example, as depicted in
For example, if a pixel generated a series of signals whose events map to a pattern of the plurality of patterns (e.g., through a fully-known state machine), then the three-dimensional ray from that pixel may be projected to a plane equation determined using the pattern. In some embodiments, the pattern may encode one or more symbols indexed or otherwise indicate the plane equation associated with the pattern. The at least one processor may thus obtain the plane equation and extract the location of the pixel (e.g., for originating the three-dimensional ray) that received the reflection therefrom based on the address encoded in the signals from the image sensor.
In some embodiments, the pattern may be identified or predicted at every event reception and thereby increase temporal density while keeping the latency associated with the code. This identification could be carried from one transmission of the code to the next if the codes are looped or associated, which could enable the prediction the code being decoded while it is received (i.e., the code may be predicted to be the same as previously obtained as long as the received bits are coherent with it).
If a pattern of the plurality of patterns caused reflections that spread across a plurality of pixels (e.g., due to dynamic motion in the scene), then the three-dimensional point at the final pixel (e.g., the pixel generating a final signal corresponding to an end of a pattern of the plurality of patterns) may be determined using a three-dimensional ray originating from the final pixel and based on the plane equation associated with the pattern. The at least one processor may then proceed backward (in time) from the final signal to finalize state machines for other pixels in the plurality of pixels receiving the reflections. For example, the image sensor may encode a timestamp on each measurement from pixels such that the at least one processor has past timestamps for previous pixels as well as timestamps for recent pixels. Thus, the three-dimensional points at these other pixels may be determined using three-dimensional rays originating from the other pixels and based on the plane equation associated with the pattern, and these points may be associated with the past timestamps.
In addition to or in lieu of step 515, method 500 may include using the candidates and the one or more state machines to decode the one or more first events and the one or more second events to obtain at least one spatial property. For example, the at least one spatial property may comprise a plane equation associated with the pattern such that the at least one processor may use the decoded plane equation to determine three-dimensional points. Additionally, or alternatively, the at least one spatial property may comprise a frequency, a brightness, or the like such that the at least one processor may use the decoded at least one spatial property in mapping the one or more first events and the one or more second events to a corresponding pattern.
At step 551, the at least one processor may detect one or more first events corresponding to one or more first pixels of the image sensor based on reflections. As disclosed herein, the reflections may be caused by a plurality of electromagnetic pulses emitted by a projector (e.g., projector 301 of
In some embodiments, the at least one processor may associate the one or more first events with the one or more first pixels based on addresses encoded with one or more first signals by the image sensor. For example, the image sensor (or a readout system in communication with the image sensor) may encode an address of the pixel(s) from which one or more first signals originated. Accordingly, the at least one processor may associate the one or more first events with the one or more first pixels based on addresses encoded with the one or more first signals. In such embodiments, the at least one processor is adapted to decode and obtain the address from the one or more first signals.
The reflections may be caused by a plurality of electromagnetic pulses emitted by a projector (e.g., projector 301 of
At step 553, the at least one processor may initialize one or more state machines based on the one or more first events. For example, the at least one processor may initialize a state machine for the one or more first pixels. Additionally, in some embodiments, the at least one processor may initialize a state machine for neighboring pixels. As explained below, with respect to
At step 555, the at least one processor may detect one or more second events corresponding to one or more second pixels of the image sensor based on reflections. For example, the at least one processor may detect the one or more second events based on a polarity change between two signals of one or more second signals, changes in amplitude between two signals of one or more second signals having magnitudes greater than one or more thresholds, or the like. In embodiments using an event-based image sensor such as image sensor 250 of
At step 557, the at least one processor may determine one or more candidates for connecting the one or more second events to the one or more first events. For example, as explained below with respect to
As depicted in
Referring again to the example of
At step 559, the at least one processor may use the one or more candidates to identify a projected line corresponding to the one or more second events and the one or more first events. For example, as explained above with respect to
At step 561, the at least one processor may calculate three-dimensional rays for the one or more first pixels and the one or more second pixels based on the identified line. For example, as depicted in
At step 563, the at least one processor may calculate three-dimensional image points for the one or more first pixels and the one or more second pixels based on the three-dimensional rays and a plane equation associated with one of the lines corresponding to the identified line. For example, as depicted in
For example, if a pixel generated a series of signals whose events map to a pattern of the plurality of patterns (e.g., through a fully-known state machine), then the three-dimensional ray from that pixel may be projected to a plane equation determined using the pattern. In some embodiments, the pattern may encode one or more symbols indexed or otherwise indicate the plane equation associated with the pattern. The at least one processor may thus obtain the plane equation and extract the location of the pixel (e.g., for originating the three-dimensional ray) that received the reflection therefrom based on the address encoded in the signals from the image sensor.
If a pattern of the plurality of patterns caused reflections that spread across a plurality of pixels (e.g., due to dynamic motion in the scene), then the three-dimensional point at the final pixel (e.g., the pixel generating a final signal corresponding to an end of a pattern of the plurality of patterns) may be determined using a three-dimensional ray originating from the final pixel and based on the plane equation associated with the pattern. The at least one processor may then proceed backward (in time) from the final signal to finalize state machines for other pixels in the plurality of pixels receiving the reflections. For example, the image sensor may encode a timestamp on each measurement from pixels such that the at least one processor has past timestamps for previous pixels as well as timestamps for recent pixels. Thus, the three-dimensional points at these other pixels may be determined using three-dimensional rays originating from the other pixels and based on the plane equation associated with the pattern, and these points may be associated with the past timestamps.
In addition to or in lieu of step 559, method 500 may include using the candidates and the one or more state machines to decode the one or more first events and the one or more second events to obtain at least one spatial property. For example, the at least one spatial property may comprise a plane equation associated with the pattern such that the at least one processor may use the decoded plane equation to determine three-dimensional points. Additionally, or alternatively, the at least one spatial property may comprise a frequency, a brightness, or the like such that the at least one processor may use the decoded at least one spatial property in mapping the one or more first events and the one or more second events to a corresponding pattern.
Consistent with the present disclosure, the projected patterns (e.g., from projector 301 of
In cases of a dynamic scene, one or more subsequent events, e.g., depicted as encoding a “0” symbol in step 630, may be received at a different pixel than the first pixel, as would be expected from the state machine. Accordingly, as shown in
At step 640, one or more subsequent events, e.g., depicted as encoding a “1” symbol, may be received at a different pixel than in step 630, as would be expected from the state machine. Accordingly, as shown in
Consistent with the present disclosure, when one or more events are detected corresponding to an end of one or more of the plurality of patterns (e.g., encoding a symbol that ends the sequence of symbols indexed to the location from which the corresponding pattern was projected), the at least one processor may complete the state machine for the current pixel and then proceed backward in time to complete the state machines of pixels for the previous event(s). Additionally, or alternatively, the at least one processor may complete the state machine when a sufficient number of events (e.g., first events, second events, and the like) have been received such that the at least one processor may distinguish between the plurality of projected patterns.
Additionally, or alternatively, to the decoding process of
In some embodiments, one or more error corrections may be encoded in the symbols. For example, one or more additional symbols at the end of the pattern may comprise error correction symbols, such as a checksum (like a check bit, parity bit, or the like) or other block correction code. Additionally, or alternatively, one or more additional symbols may be added amongst the pattern to form a convolutional correction code or other continuous correction code. In addition, with or in lieu of such error corrections, the projector may also be configured to project the patterns in a temporal loop such that the system excepts to receive the same patterns over and over. Accordingly, one lost pattern will result in one lost depth calculation but will not impact the overall series of three-dimensional images except for a single frame loss. Moreover, this lost frame may be recovered using extrapolation from neighboring frames.
Although depicted using “0” and “1,” any number of symbols may be used based on a dictionary of symbols corresponding to characteristics of electromagnetic pulses (e.g., storing characteristics of pulses in association with particular symbols). Having a larger dictionary may allow for generating a set of unique patterns that are shorter in length.
Moreover, although described using a simple neighbor search, the state machine search may be conducted along an epipolar line or any other appropriate area of pixels for searching. For example, as explained with respect to
At step 701, the at least one processor may receive an event from an image sensor (e.g., image sensor 200 of
At step 703, the at least one processor may connect the received event to a most recent event if at least one connectivity criterion is met. For example, the at least one processor may determine a temporal distance between the received event and the most recent event and connect them if the temporal distance satisfies a threshold. Additionally, or alternatively, the at least one processor may determine a spatial distance between the received event and the most recent event and connect them if the spatial distance satisfies a threshold. Accordingly, the at least one connectivity criterion may comprise a temporal threshold, a spatial threshold, or any combination thereof. In one combinatory example, the spatial threshold may be adjusted based on which of a plurality of temporal thresholds are satisfied. In such an example, events closer in time may be expected to be closer in space. In another combinatory example, the temporal threshold may be adjusted based on which of a plurality of spatial thresholds are satisfied. In such an example, events closer in space may be expected to be closer in time.
At step 705, the at least one processor may determine whether the at least one connectivity criterion is satisfied for other recent events. For example, the at least one processor may use the at least one connectivity criterion to find all other recent events related to the received event
At step 707, the at least one processor may merge cluster identifiers associated with all recent events for which the at least one connectivity criterion is satisfied. Accordingly, all recent events from steps 703 and 705 that satisfy the at least one connectivity criterion will be assigned the same cluster identifier as that of the event received at step 701.
At step 709, the at least one processor may output the cluster as a set of related events. For example, all events having the same cluster identifier may be output.
Exemplary embodiments and features that may be used for method 700 are described in European Patent Application No. 19154401.4, filed on Jan. 30, 2019, and titled “Method of Processing Information from an Event-Based Sensor.” This disclosure of this application is incorporated herein by reference.
The cluster algorithm of method 700 may be used to perform the search of
Additionally, or alternatively, method 700 may be used to cluster raw events received from the image sensor such that each cluster is then decoded, and decoded symbols of that cluster are connected via state machines. Accordingly, rather than decoding each symbol and connecting the symbols sequentially, the decoding and connecting may be performed after clustering to reduce noise.
In example 850 of
Other techniques for matching (not depicted in
In another example, frequency of light on image sensor 200 of
Although not depicted in
Similarly, the at least one processor performing the three-dimensional imaging may additionally or alternatively discard any of the digital signals associated with a bandwidth not within a predetermined threshold range. For example, a projector emitting the plurality of patterns onto the scene may be configured to project electromagnetic pulses within a particular frequency (and thus bandwidth) range. Accordingly, the system may use a bandwidth filter (in hardware and/or in software) to filter noise and only capture frequencies corresponding to those emitted by the projector. Additionally, or alternatively, the system may use a bandwidth filter (in hardware and/or in software) to filter high-frequency and/or low-frequency light in order to reduce noise.
In addition to or in lieu of the software and/or hardware bandpass and/or frequency filters described above, the system may include one more optical filters used to filter light from the scene impinging on the image sensor. For example, with respect to
In some embodiments, rather than using single events as depicted in example 800 or timings between single events as depicted in example 850, embodiments of the present disclosure may encode symbols using event bursts. For example,
At step 901, the at least one processor may receive an event from an image sensor (e.g., image sensor 200 of
At step 903, the at least one processor may verify the polarity of the event. For example, the at least one processor may determine whether the polarity matches a polarity expected for the event, whether the same as a previous event if a plurality of increases or decreases is expected or different than the previous event if a polarity change is expected. For example, the projected patterns may be configured to generate a plurality (such as 2, 3, or the like) of events in order to signal an increasing signal or a decreasing signal. Such a plurality may allow for filtering of noise at step 903. If the polarity is not valid, the at least one processor may discard the event and start over at step 901 with a new event, as depicted in
At step 905, the at least one processor may discard the received event if too remote in time from a previous event (e.g., if a difference in time exceeds a threshold). Accordingly, the at least one processor may avoid connecting events too remote in time to form part of a single burst. If the event is too remote, the at least one processor may discard the event and start over at step 901 with a new event, as depicted in
At step 907, the at least one processor may increment an event counter of an associated pixel. For example, the associated pixel may comprise the pixel from which the event of step 901 was received. The event counter may comprise an integer counting events received at recursive executions of step 901 that qualify, under steps 903 and 905, as within the same burst.
At step 909, the at least one processor may extract a burst when the event counter exceeds an event threshold. For example, the event threshold may comprise between 2 and 10 events. In other embodiments, a greater event threshold may be used. If the burst is extracted, the at least one processor may reset the event counter. If the event counter does not exceed the event threshold, the at least one processor may return to step 901 without resetting the event counter. Accordingly, additional events that qualify, under steps 903 and 905, as within the same burst, may be detected and added to the event counter at step 907.
In some embodiments, method 900 may further include discarding the received event if too remote in time from a first event of a current burst. Accordingly, method 900 may prevent noise from causing a burst to be inadvertently extended beyond a threshold.
Additionally, or alternatively, method 900 may track a number of events by region such that bursts are detected only within regions rather than across a single pixel or the whole image sensor. Accordingly, method 900 may allow for detection of concurrent bursts on different portions of an image sensor.
Whenever an event is discarded, the at least one processor may reset the event counter. Alternatively, in some embodiments, the at least one processor may store the corresponding event counter even when an event is discarded. Some embodiments may use a combination of saving and discarding. For example, the event counter may be saved if an event is discarded at step 903 but may be reset if an event is discarded at step 905.
A detailed description of exemplary embodiments of method 900 is described in International Patent Application No. PCT/EP2019/051919, filed on Jan. 30, 2019, and titled “Method and Apparatus of Processing a Signal from an Event-Based Sensor.” The disclosure of this application is incorporated herein by reference.
Extracted bursts from method 900 may comprise a symbol (e.g., used as part of an encoded pattern). For example, by using a burst to encode a symbol rather than a single event, the system may increase accuracy and reduce noise. Additionally, or alternatively, extracted bursts from method 900 may comprise a set of symbols forming the encoded pattern. For example, by using a burst to encode the pattern, the system may distinguish between distinct patterns in time with greater accuracy and reduced noise.
Although described using the architectures of
While certain embodiments have been described with reference to calculating the three-dimensional rays and three-dimensional image points, systems consistent with the present disclosure may perform other operations and/or be used in other applications. For example, in some embodiments, positions of the pixels where the reflections are extracted may be used to reconstruct a three-dimensional scene or detect a three-dimensional object (such as a person or another object). In such embodiments, the pixel positions may correspond to the three-dimensional positions as a result of the calibration of the system.
Embodiments of the present disclosure may compute three-dimensional points without having to perform triangulation operations by, for example, using a look-up table or machine learning. In some embodiments, a stored look-up table may be used by at least one processor to determine a three-dimensional point from an identified line on a specific pixel position i, j. Additionally, or alternatively, machine learning may be used to determine three-dimensional points from pixel positions for a calibrated system.
In still further embodiments, pixel differences may be used for analysis purposes. For example, assume a disparity (“d”) refers to a pixel difference between where a projected line is observed on the sensor (“x”) and where it is emitted from as an equivalent pixel on the projector (“x_L”), expressed as d=x−x_L. In some embodiments, positions “x” might even directly be used without the direct knowledge of “x_L” in applications where it could, for instance, be extracted through machine learning. In such applications, the three-dimensional points may be computed from the “x” pixel coordinates and the associated disparity to segment background from foreground. For example, the at least one processor may threshold disparity measurements without reconstructing depth (e.g., d<=threshold would be background while d>=threshold would be foreground). In automotive or surveillance applications, for example, it may be desirable to remove points from the ground versus points on objects. As further examples, face, object, and/or gesture recognition could directly receive and be performed from the disparities.
Estimating depth of an object or in a region of interest (ROI) of the sensor could be done after integration (like averaging) of disparities inside an object bounding box or the ROI. Further, in some embodiments, simultaneous landscaping and mapping (SLAM) applications using inverse depth models could use disparity as a proportional replacement.
The foregoing description has been presented for purposes of illustration. It is not exhaustive and is not limited to precise forms or embodiments disclosed. Modifications and adaptations of the embodiments will be apparent from consideration of the specification and practice of the disclosed embodiments. For example, the described implementations include hardware, but systems and methods consistent with the present disclosure can be implemented with hardware and software. In addition, while certain components have been described as being coupled to one another, such components may be integrated with one another or distributed in any suitable fashion.
Moreover, while illustrative embodiments have been described herein, the scope includes any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations and/or alterations based on the present disclosure. The elements in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application, which examples are to be construed as nonexclusive. Further, the steps of the disclosed methods can be modified in any manner, including reordering steps and/or inserting or deleting steps.
In addition to the above-referenced patents and applications, the entirety of each of the following applications are hereby incorporated by reference herein: U.S. Application No. 62/809,557, filed Feb. 22, 2019, titled Systems and Methods for Three-Dimensional Imaging and Sensing; U.S. Application No. 62/810,926, filed Feb. 26, 2019, titled Systems and Methods for Three-Dimensional Imaging and Sensing; and U.S. Application No. 62/965,149, filed Jan. 23, 2020, titled Systems and Methods for Three-Dimensional Imaging and Sensing.
The features and advantages of the disclosure are apparent from the detailed specification, and thus, it is intended that the appended claims cover all systems and methods falling within the true spirit and scope of the disclosure. As used herein, the indefinite articles “a” and “an” mean “one or more.” Similarly, the use of a plural term does not necessarily denote a plurality unless it is unambiguous in the given context. Words such as “and” or “or” mean “and/or” unless specifically directed otherwise. Further, since numerous modifications and variations will readily occur from studying the present disclosure, it is not desired to limit the disclosure to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the disclosure.
Other embodiments will be apparent from consideration of the specification and practice of the embodiments disclosed herein. It is intended that the specification and examples be considered as examples only, with a true scope and spirit of the disclosed embodiments being indicated by the following claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/054685 | 2/21/2020 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62809557 | Feb 2019 | US | |
62810926 | Feb 2019 | US | |
62965149 | Jan 2020 | US |