One or more portions of the disclosure, alone and/or in combination, of this patent document contains material which is subject to (copyright or mask work) protection. The (copyright or mask work) owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all (copyright or mask work) rights whatsoever.
The present disclosure relates to systems, components, and methodologies for image processing. In particular, the present disclosure relates to systems, components, and methodologies that perform image processing and sparsity guarantee using digital NeuroMorphic (NM) vision techniques.
According to the present disclosure, systems, components, and methodologies are provided for NM-based image data generation, image data processing and subsequent use to detect and/or identify objects and object movement in such image data for assistance, automation, control and/or documentation.
In accordance with disclosed embodiments, structure and software are provided for simulation of conventional analog NM system functionality using a digital NM vision system that incorporates at least one detector that includes one or more NM sensors, a digital retina implemented using, for example, CMOS technology that enables generation of digital NM data for image data processing by a digital NM engine that facilitates improved object detection, classification, and tracking. As such, exemplary embodiments are directed to structure and software that may simulate analog NM system functionality.
In accordance with at least one embodiment, the digital NM engine may include a combination of one or more detectors and one or more processors running software on back-end to generate digital NM output.
In accordance with at least one embodiment, the digital NM vision system, its components and utilized methodologies may be used to compress high framerate video data by performing feature extraction close to an imaging sensor to generate an encoded version of image data that includes differences and surrounding spatio-temporal regions for subsequent image processing. Thus, in accordance with at least one embodiment, the hardware and methodologies may be utilized as an effective method for compressing high framerate video, e.g., by analyzing image data to compress the data by capturing differences between a current frame and a one or more previous frames and applying a transformation.
In accordance with at least one embodiment, the digital NM vision system and/or at least a subset of its components may be incorporated in a stereo neuromorphic pair. In accordance with at least one implementation, components of the digital NM vision system may be incorporated in a compound camera. In such an implementation, the computational element of each imaging sensor may be coupled to other computational elements of other imaging sensors, e.g., adjacent sensors or other types of sensors, to collaborate with other computational elements to provide functionality. For example, in accordance with at least one implementation, the digital NM vision system components may be incorporated in an event-based camera.
Additional features of the present disclosure will become apparent to those skilled in the art upon consideration of illustrative embodiments exemplifying the best mode of carrying out the disclosure as presently perceived.
The detailed description particularly refers to the accompanying figures in which:
The figures and descriptions provided herein may have been simplified to illustrate aspects that are relevant for a clear understanding of the herein described devices, systems, and methods, while eliminating, for the purpose of clarity, other aspects that may be found in typical devices, systems, and methods. Those of ordinary skill may recognize that other elements and/or operations may be desirable and/or necessary to implement the devices, systems, and methods described herein. Because such elements and operations are well known in the art, and because they do not facilitate a better understanding of the present disclosure, a discussion of such elements and operations may not be provided herein. However, the present disclosure is deemed to inherently include all such elements, variations, and modifications to the described aspects that would be known to those of ordinary skill in the art.
Within NeuroMorphic (NM) data, associating roots enables tracking of an object within collected image data. This is because the roots are part of the same object included in the image data. Therefore, by associating the roots across time, one is able to determine a velocity for a point on an object. More specifically, velocity may be determined by performing analysis of image data to identify associated roots along a single orientation over time f(t). Since the roots are part of the same object, associating them across time may result in the determination of a velocity for a point on an object as a function of time. This root association may be performed effectively, even using one dimensional-monocular root association of data. However, to effectively perform such root association, one must determine required sparsity guarantee. The sparsity guarantee is a measure of the probability of correctly assigning each detected motion signal to the corresponding object generating that motion signal. Achieving the sparsity guarantee may be difficult or impossible for cases where the motion signal is not consistent across time and/or with lower frame rates of image collection where detected motion smears between moving objects.
More specifically, processors and software described herein can reduce in amount of data necessary to track objects in image data with associated reductions in computational cost, processor requirements and increased processing speed. These improvements that enable real-time or near-real-time sensing, detection, identification, and tracking of objects.
In illustrative embodiments, an example of which being illustrated in
Sensor 120 may output the image data 125 into one or more sensor processors 130, e.g., one or more digital retinas, that converts that image data into shapelet data that may include intensity data and data derived or derivable from such intensity data, including “spikes,” “roots”, “blobs” and associated data using image processing and data processing techniques explained herein. More specifically, in at least one embodiment, the sensor processor 130 includes digital circuitry that generates spike data indicative of a spike in association with a particular photoreceptor within the sensor 120 whenever the intensity value measured by that photo receptor exceeds a threshold.
As shown in
Shapelet data is provided by the sensor processor 130 to the object signature detector 140 for subsequent analysis to formulate one or more object signatures 115. That object signature data and/or shapelet data may also be output a machine learning engine 145 that may or may not be located in the same location as the other components illustrated in
Referring again to
In accordance with disclosed embodiments, one dimensional root association may be performed, which requires generation of shapelet data 135 that may include blobs, roots and spikes along an orientation and associating the roots. In the illustrative embodiments, shapelet data 135 is generally described with reference to roots as location points of the image data 125 (but as previously mentioned, shapelet data may include an variety of economized image data). As opposed to spikes (light intensity amplitudes), roots tend to be consistent across space (multiple cameras) and time (multiple frames). Roots can be linked or associated umabiguously with each other to enable extraction of contours, or edges related to the image data and preferably related to the object 115. The extracted contours can be used to discern object motion within the field of view.
Returning to the operations performed by the sensor processor 130, the processor generates shapelet data that enables digital NM vision including spike (sparse) data, 5D (x, y, t, Vx, Vy) velocity data and other digital data. Each spike specifies its spatial location within the input image (x, y), its temporal coordinate or timestamp (t), and its optical velocity (Vx, Vy). This shapelet data enables image data processing for improved object detection, classification, and tracking, including machine and deep learning.
As such, in accordance with at least one embodiment, the digital NM detector 110 may include one or processors running software to generate digital NM output data for analysis and subsequent control of components with the environment imaged by the detector 110. Velocity data may include “velocity vectors” which are a mathematical representation of optical flow of pixels (or photoreceptors) in image data. Velocity vector data may be used to characterize or represent a velocity space, which may be thought of as the spatial and temporal representation of video data including a plurality of frames depicting movement of an object in an environment. More specifically, in velocity space, pixels having the same velocity vector may be aggregated and associated with one another to perform velocity segmentation, which enables the ability to identify and differentiate objects within the image data based on their relative motion over frames of image data. Thus, velocity vector data may be used to indicate basic features (e.g., edges) of objects included in the image data, by identifying boundaries between the edges of the objects in the image data. This data may, therefore, be used to define one or more boundaries between foreground objects and background, thus creating velocity silhouettes, or blobs. In this way, velocity silhouettes, or blobs, may define edges at the boundary between a foreground object and a background object.
A methodology for performing one dimensional root association is illustrated in
Therefore, determining roots are key to enabling the sparsity guarantee. First, unlike spikes, roots are consistent across frames. Second, unlike spikes which are two-dimensional quantities that represent the area of the receptive field of a pixel, roots are dimensionless points that represent an exact place on the image. Third, similar to spikes, roots can be decluttered based on polarity. However, unlike spikes, roots can be projected into multidimensional space where each dimension corresponds to an orientation. Finally, roots spread out the points along each dimension and create dead zones creating a guaranteed minimum spacing between adjacent roots, known as a sparsity guarantee. These characteristics of roots enable movement of objects in captured image data to be determined to a high degree of accuracy. Determined roots in the image frames will have a guaranteed minimum dead zone in all directions, or dimensions. Once a root has been identified, it can be known that no root can exist within one pixel unit of that root in the dead zone. These dead zones create known minimum isolation spacing between roots that reduces confusion and noise thereby improving the ability to associate identified isolated roots across successive frames in time.
In accordance with disclosed embodiments, an image filter 320 may be used on input image data 315 to generate shapelet data including blob image data 325 as shown in
In accordance with some embodiments, the center-surround filter window size may be as small as a 3×3 matrix up to and including a 64×64 matrix, dependent on the pixel resolution of the incoming image data. The filter window size is selected so that the input image resolution will equal the output blob image resolution. As a result, root identification may occur with sub-pixel accuracy. More specifically, root identification may occur at to ⅛ pixel accuracy. In other words, roots are spread out 8× more by maintaining the image resolution during image filtering to obtain the blob image.
In some embodiments, the filter 320 is a difference of Gaussian (“DOG”) filter. In some embodiments, the filter 320 is a Laplacian of Gaussian filter which may be applied to approximate the DOG filter.
Similarly, the blob image intensity profile 525 dips up before the negative edge and then dips down after the negative edge, this creating a zero-crossing 546 that corresponds to the negative edge 642 of the input image. This zero-crossing 546 along a negative slope in the intensity profile is referred to as a negative root. Mathematically, no neighboring roots may occur where the blob image dips up/down adjacent to the root as defined by the zero crossings 544, 546. These regions are referred to as dead zones 548. It should be noted, in particular, that dead zones 548 are present within the intensity profile of generated blob image data 525 such that no roots (zero crossings 544, 546) are located within the dead zones 548. Each root is separated from any other root in the blob image by a dead zone of at least one pixel.
As seen in
As illustrated in
This particular image filtering and root filtering greatly reduces confusion in associating roots over successive image frames of data, by reducing the amount of data by a factor of four in frame-to-frame analysis. Root association requires there be roots in each frame, and therefore, their associated dead zones must also be in each frame. These required dead zones create a relatively large spacing between roots along an orientation and thereby make it easier to identify and associate the same root along multiple frames. Further processing to associate the roots includes first separating the roots based on whether they correspond to the horizontal orientation 0 or vertical orientation 2 and select an orientation for association. Next, roots, already separated by dead zones of 8 pixel subunits, in that orientation are separated into positive and negative roots. As exemplified in
1D root association across multiple successive image frames of scene data in time along orientation zero may result in a determination of horizontal velocity of that root as vx=2. Similarly, in orientation 2, a 1D root association may be applied across multiple frames and the vertical velocity of the object may be determined as vy=1.
The final velocity may be computed by combining the velocities of the space-time skews and computed velocities. For example, the 1D velocity for the vertical space-time skew (vx=0, vy=1) may be combined with the 1D velocity associated for orientation 0 (vx=2, vy=0) to give a final 2D velocity of (vx=2, vy=1).
Additionally, 1D and 2D linking of roots may be achieved through various filters and rules to form edges of moving objects in the scene as described in described in detail in U.S. Ser. No. 15/619,992, entitled SYSTEM AND METHOD FOR ROOT ASSOCIATION IN IMAGE DATA filed Jun. 12, 2017, incorporated by reference in its entirety.
This application is a continuation-in-part, and claims priority to and the benefit, of the prior filed non-provisional U.S. patent application Ser. No. 15/386,220, filed Dec. 21, 2016, the contents of which are incorporated herein by reference in their entirety, and at least including those portions directed to neuromorphic image data collection and use.
Number | Name | Date | Kind |
---|---|---|---|
5790690 | Doi | Aug 1998 | A |
6020953 | Barrows | Feb 2000 | A |
6023521 | Sarpeshkar et al. | Feb 2000 | A |
6212289 | Sarpeshkar et al. | Apr 2001 | B1 |
6384905 | Barrows | May 2002 | B1 |
6665439 | Takahashi | Dec 2003 | B1 |
6718062 | Zhang | Apr 2004 | B1 |
7388988 | Luo | Jun 2008 | B2 |
7659967 | Barrows et al. | Feb 2010 | B2 |
7925051 | Gensolen et al. | Apr 2011 | B2 |
8098886 | Koch et al. | Jan 2012 | B2 |
8116581 | Sun | Feb 2012 | B2 |
8332340 | Snider | Dec 2012 | B2 |
8396297 | Panda | Mar 2013 | B2 |
8401297 | Apostolos et al. | Mar 2013 | B1 |
8694449 | Weng et al. | Apr 2014 | B2 |
8780240 | Posch et al. | Jul 2014 | B2 |
8930291 | Srinivasa et al. | Jan 2015 | B1 |
8959040 | Cruz-Albrecht et al. | Feb 2015 | B1 |
9014416 | Fisher et al. | Apr 2015 | B1 |
9047568 | Fisher et al. | Jun 2015 | B1 |
9070039 | Richert | Jun 2015 | B2 |
9098811 | Petre et al. | Aug 2015 | B2 |
9111215 | Piekniewski | Aug 2015 | B2 |
9111226 | Richert | Aug 2015 | B2 |
9123127 | Richert | Sep 2015 | B2 |
9129221 | Piekniewski et al. | Sep 2015 | B2 |
9152915 | Gabardos et al. | Oct 2015 | B1 |
9183493 | Richert et al. | Nov 2015 | B2 |
9186793 | Meier | Nov 2015 | B1 |
9195903 | Andreopoulos et al. | Nov 2015 | B2 |
9195934 | Hunt et al. | Nov 2015 | B1 |
10133944 | Zink et al. | Nov 2018 | B2 |
10229341 | Zink et al. | Mar 2019 | B2 |
10235565 | Zink et al. | Mar 2019 | B2 |
20030118245 | Yaroslavsky | Jun 2003 | A1 |
20040096106 | Demi | May 2004 | A1 |
20090262247 | Huang et al. | Oct 2009 | A1 |
20110222779 | Karanam | Sep 2011 | A1 |
20110286671 | Xu | Nov 2011 | A1 |
20130251209 | Kim | Sep 2013 | A1 |
20140064609 | Petre et al. | Mar 2014 | A1 |
20140229411 | Richert et al. | Aug 2014 | A1 |
20140258195 | Weng et al. | Sep 2014 | A1 |
20150161505 | Sugioka | Jun 2015 | A1 |
20180173934 | Zink | Jun 2018 | A1 |
20180173954 | Zink | Jun 2018 | A1 |
20180173982 | Zink | Jun 2018 | A1 |
20180173983 | Zink | Jun 2018 | A1 |
20180173992 | Zink | Jun 2018 | A1 |
20180207423 | Benosman | Jul 2018 | A1 |
20180249093 | Saeki et al. | Aug 2018 | A1 |
20190017811 | Watanabe et al. | Jan 2019 | A1 |
20190116322 | Holzer et al. | Apr 2019 | A1 |
20190213388 | Makeev et al. | Jul 2019 | A1 |
Entry |
---|
Sabzmeydani, P. et al., “Detecting Pedestrians by Learning Shapelet Features”, 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition, p. 1-8. (Year: 2007). |
Kovesi, P. (2005). Shapelets correlated with surface normals produce surfaces. Proceedings of the IEEE International Conference on Computer Vision. 2. 994-1001 vol. 2. 10.1109/ICCV.2005.224. (Year: 2005). |
Garcia et al.; pyDVS: An Extensible, Real-time Dynamic Vision Sensor Emulator using Off-the-Shelf Hardware; 2016 IEEE Symposium Series on Computational Intelligence (SSCI); Dec. 6, 2016; pp. 1-7. |
Kuhn; Fast MPEG-4 Motion Estimation: Processor Based and Flexible VLSI Implementations; Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology; Oct. 1, 1999; vol. 23, No. 1; pp. 67-92. |
Mueggler et al.; Lifetime Estimation of Events from Dynamic Vision Sensors; 2015 IEEE International Conference on Robotics and Automation (ICRA); IEEE; May 26, 2015; pp. 4874-4881. |
Rueckauer et al.; Evaluation of Event-Based Algorithms for Optical Flow with Ground-Truth from Inertial Measurement Sensor; Frontiers in Neuroscience; Apr. 25, 2016; vol. 10, Article 176. |
Tschechne et al.; Bio-Inspired Optic Flow from Event-Based Neuromorphic Sensor Input; ECCV 2016 Conference; Oct. 6, 2014; pp. 171-182. |
Search Report and Written Opinion for International Patent Application No. PCT/EP2017/083411; dated Mar. 21, 2018. |
Floreano et al.; Miniature curved artificial compound eyes; PNAS; Jun. 4, 2013; pp. 9267-9272; vol. 110, No. 23. |
Tsang et al.; Neuromorphic Implementation of Active Gaze and Vergence Control; Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology; 2008; pp. 1076-1079. |
Brändli; Event-Based Machine Vision; Doctoral Thesis; 2015. |
Number | Date | Country | |
---|---|---|---|
20180173982 A1 | Jun 2018 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15386220 | Dec 2016 | US |
Child | 15674990 | US |