(1) Field of Invention
The present invention relates to a method and system for visual object detection and response and, more specifically, to a multi-stage method for generic object detection using cognitive swarms and a system for automated response to the detected objects.
(2) Description of Related Art
Current methods of object detection focus on direct detection of the desired class of objects by searching the entire image window. Such methods substantially fail in detecting highly specific objects, such as a narrow specific-class of a more general-class of objects. A traditional search for a highly specific object, for example, a person in a pose with their arms held vertically upward as would a football referee signaling a touchdown, would require a feature detection algorithm specifically tailored to detecting a person in such a pose. There currently exist object detection algorithms for detecting a human form. Tailoring these algorithms to detect a human form engaged in a specific position would likely be very complex, thereby increasing processing time without a guarantee of accurate results. Such a method is also impractical if the object detection task changes which would require creation of a different highly specific object detection algorithm. As an alternative to creating a highly specific detection algorithm, one could search for the simple double vertical signatures created by the arm position of the object. However, such a method will likely yield a high false alarm rate since it will detect all similar vertical signatures in the scene, including signatures from trees, buildings, and other irrelevant objects.
Thus, a continuing need exists for a method for fast and accurate detection of highly specific objects in large volumes of video imagery.
The present invention relates to a method and system for visual object detection and response and, more specifically, to a multi-stage method for generic object detection using cognitive swarms and a system for automated response to detected entities. A first stage comprises searching for members of a predetermined general-class of objects in an image using a cognitive swarm, detecting members of the general-class of objects in the image, and selecting regions of the image containing detected members of the general-class of objects. A second stage comprises searching for members of a predetermined specific class of objects within, the selected regions using a cognitive swarm, and detecting members of the specific-class of objects within the selected regions of the image, whereby members of the specific-class of objects are located in the image.
In another aspect, the method further comprises an act of cueing an operator with the locations of detected members of the specific-class of objects.
In another aspect, the method further comprises an act of cueing an automatic response system with the locations of detected members of the specific-class of objects.
In yet another aspect, the general-class of objects is humans and the specific-class of objects is humans in a specific pose.
As can be appreciated by one skilled in the art, the present invention also comprises a data processing system having a memory and a processor, the data processing system including computer-readable instructions for causing the data processing system to search for members of a predetermined general-class of objects in an image using a cognitive swarm, detect members of the general-class of objects in the image, select regions of the image containing detected members of the general-class of objects, search for members of a predetermined specific-class of objects within the selected regions using a cognitive swarm, and detect members of the specific-class of objects within the selected regions of the image.
In another aspect, the data processing system is flintier configured to output locations of detected members of the specific-class to an operator display unit.
In yet another aspect, the data processing system is further configured to output locations of detected members of the specific-class to an automatic response system.
The present invention further comprises a complete system for object detection comprising at least one optical sensor for imaging a scene, a data processing sub-system as described above further configured to receive at least one image from the at least one optical sensor, and an operator display sub-system configured to receive locations of detected members of the specific-class of objects from the data processing sub-system and alert an operator to the locations of the detected members of the specific-class of objects.
In yet another aspect of the system, the operator display sub-system comprises a display means selected from a group consisting of illuminating the object, displaying the object location through personal head-worn displays, and displaying the object location on a flat panel display.
In another aspect of the system, the at least one optical sensor comprises a plurality of optical sensors, the data processing subsystem comprises a network of data processors configured to process the images from the plurality of optical sensors in parallel, and the network of data processors are connected with a master processor for coordinating results from the network of data processors.
In yet another aspect, the system further comprises an automatic response sub-system configured to receive locations of detected members of the specific-class of objects from the data processing sub-system.
As can be appreciated by one skilled in the art, the present invention thither comprises a computer program product having computer readable instructions encoded thereon for causing a data processing system to perform the acts of the method of the present invention as previously described.
The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where:
The present invention relates to a method and system for visual object detection and response and, more specifically, to a multi-stage method for generic object detection using cognitive swarms and a system for automated response to the detected objects. The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings may be replaced, by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is only one example of a generic series of equivalent or similar features.
Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of” or “act of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. 112: Paragraph 6.
Further, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction. Instead, they are used to reflect relative locations and/or directions between various portions of an object.
(1) Introduction
The present invention generally relates to a method and system for visual object detection and response. More specifically, the present invention is directed to a multi-stage method for generic object detection using cognitive swarms and a system for automated response to the detected objects. For ease of understanding, the following description is divided into two major sections, the first major section, labeled “(2.0 Visual Object Detection of Highly-Specific. Objects,” describes the two-stage search process by which highly-specific objects can be detected in a visual image. The subsequent major section labeled “(3.0) System for Automated Response to Detected Objects,” describes the surrounding system that relays the results of the detection search to operators in a manner that would allow them to take appropriate responsive action and, optionally, to an automatic response system designed to respond automatically.
(2.0) Visual Object Detection of Highly-Specific Objects
As noted above, the present invention includes a multi-stage method of object detection. In particular, the method was designed as an aid in video surveillance for detecting humans engaged in specific activities. However, the methods and processes of the present invention can be applied to generic object detection of any desired class of objects. Therefore, the embodiments described herein for detecting human activities are non-limiting examples that are provided for illustrative purposes only.
Compared to conventional methods, the two-stage object detection method provides much faster object detection capabilities, as well as the ability to detect an object based on the context of its surroundings. For example, the system is capable of detecting the pose of the person holding a small object when the object itself is too small to be detected directly. The conventional processing flow for recognition of objects in images or video using computer vision consists of three steps, as shown in
The search or window positioning stage 110 of the specific object detection system can be implemented using cognitive swarms, which is based on Particle Swarm Optimization (PSO). PSO is known in the art and was described by Kennedy, J., Eberhart, R. C., and Shi, Y. in “Swarm intelligence,” San Francisco: Morgan Kaufmann Publishers, 2001. PSO was also described by R. C. Eberhart and Y. Shi in “Particle Swarm Optimization: Developments, Applications, and Resources.” 2001. Cognitive swarms are a new variation and extension of PSO. Cognitive swarms search for and recognize objects by combining PSO with an objective function that is based on the recognition confidence. A previous patent application has been filed on cognitive swarms for single-stage detection. The details of the cognitive swarm framework are disclosed in U.S. patent application Ser. No. 11/367,755, titled “Cognitive Swarm Vision Framework with Attention Mechanisms,” which is incorporated by reference as though fully set forth herein.
PSO is a relatively simple optimization method that has its roots in artificial life in general, and to bird flocking and swarming theory in particular. Conceptually, it includes aspects of genetic algorithms and evolutionary programming. A population of potential solutions is maintained as the positions of a set of particles in a solution space where each dimension represents one solution component. Each particle is assigned a velocity vector and the particles then explore cooperatively the solution space in search of the objective function optima. Each particle keeps track of its coordinates in multi-dimensional space that are associated with the best solution (p) it has observed so far. A global best parameter (pg) is used to store the best location among all particles. The velocity of each particle is then changed towards p and pg in a probabilistic way according to:
vi(t+1)=wvi(t)+c1φi[p1(t)−xi(t)]+c2φ2[pg(t)−xi(t)],
xi(t+1)=xi(t)+χvi(t+1)
where xi(t) and vi(t) are the position and velocity vectors at time t of the i-th particle and c1 and c2 are parameters that weight the influence of their respective terms in the velocity update equation, w is a decay constant which allows the swarm to converge to a solution more quickly, φ1 and φ2 are random numbers between 0 and 1 that introduce a degree of random exploration, and χ is a parameter that controls the convergence properties of the swarm.
The above PSO dynamics reflect a socio-psychological model where individual particles change their beliefs in accordance with a combination of their own experience and the best experience of the group. This is in contrast to other models of cognition where an individual changes his beliefs to become more consistent with his own experience only. The random element introduces a source of noise which enables an initial random search of the solution space. The search then becomes more directed after a few iterations as the swarm starts to concentrate on more favorable regions. This type of search is much more efficient than exhaustive or gradient based search methods. PSO relies on the fact that in most practical problems the optimum solution usually has better than average solutions residing; in a volume around it. These good solutions tend to attract the particles to the region where the optimum lies. The swarm becomes more and more concentrated until the optimum is found (e.g., pg no longer changes). In cognitive swarms, the PSO objective function is the confidence level of an object classifier. The cognitive swarm locates objects of interest in the scene by maximizing the classifier confidence.
The feature extraction and feature value calculation stage 112 can be implemented using various types of features known in the art. As a non-limiting example.
The parameter tval controls how the continuously-valued version of the Gabor wavelet is converted into a thresholded version that assumes values of −1, 0, or 1 only. The Threshold Gabor Wavelet has computational advantages because multiplication is not required to calculate the feature values. All of the adjustable parameters in the above equation are optimized for high recognition rate during the classifier development process.
A third possible feature set is Fuzzy Edge Symmetry Features, which is known in the art and shown in
The feature sets used can be selected by sorting against their importance for a classification task using any of a number of techniques known in the art including, but not limited to, using metrics such as mutual information or latent feature discovery models. The feature sets used for training the first-stage classifiers are different than those used to train the second-stage classifiers. For example, wavelet feature sets for a human/non-human classification stage are shown in
Once an object has been identified as a member of the human class, it is sent to a second-stage classifier to identify the object as a member of a predetermined specific-class. In this case, the predetermined specific-class is a human holding his arms vertically upward, as would a football referee signaling a touchdown.
In order to achieve both high accuracy and speed, a classifier cascade as exemplified in
Plots of an experimental detection rate versus false alarm rate for the two-stage classification method of the present invention are shown in
The detailed algorithm flow for the two-stage detection system is shown in
(3) System for Automated Response to Detected Objects
The present invention further incorporates the methods described above into a system for display of and automated response to, detected objects.
The automated response subsystem 1108 is an optional component of this system. There may be situations in which a response to identified objects is needed faster than human operators will be able to react to any alert. Non-limiting examples of an automated response subsystem can be automatically re-directing cameras to the location of a recent play in a sporting event, automated contact with law enforcement upon detection of unauthorized personnel, or automated locking/unlocking of doors in a building.
Another system component, as shown in
One display option is to illuminate the object with a spotlight. Another display option is to use Augmented Reality technologies to display the object location through personal head-worn displays 1200 worn by the operators, as shown in
A third and more conventional display approach is to display the object information in a head-down display or on a small flat panel display, such as a personal digital assistant (PDA). Possible implementations include two-dimensional (2-D) and three dimensional (3-D) versions of such a display, as shown in
The data processing system 1104 in
Finally and as illustrated in
This application is a Continuation-in-Part application of U.S. application Ser. No. 11/800,264, filed on May 3, 2007, entitled, “Behavior recognition using cognitive swarms and fuzzy graphs,”
This invention was made with Government support under Contract No. FA9453-05-C-0252, awarded by the Defense Advanced Research Projects Agency. The Government has certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
6178141 | Duckworth et al. | Jan 2001 | B1 |
6621764 | Smith | Sep 2003 | B1 |
7046187 | Fullerton et al. | May 2006 | B2 |
7066427 | Chang | Jun 2006 | B2 |
7104496 | Chang | Sep 2006 | B2 |
7110569 | Brodsky et al. | Sep 2006 | B2 |
7135992 | Karlsson et al. | Nov 2006 | B2 |
7139222 | Baxter et al. | Nov 2006 | B1 |
7151478 | Adams et al. | Dec 2006 | B1 |
7190633 | Brinn et al. | Mar 2007 | B2 |
7266045 | Baxter et al. | Sep 2007 | B2 |
7292501 | Barger | Nov 2007 | B2 |
7359285 | Barger et al. | Apr 2008 | B2 |
7408840 | Barger et al. | Aug 2008 | B2 |
7586812 | Baxter et al. | Sep 2009 | B2 |
7599252 | Showen et al. | Oct 2009 | B2 |
7599894 | Owechko et al. | Oct 2009 | B2 |
7636700 | Owechko et al. | Dec 2009 | B2 |
8098888 | Mummareddy et al. | Jan 2012 | B1 |
8285655 | Medasani et al. | Oct 2012 | B1 |
20050100192 | Fujimura et al. | May 2005 | A1 |
20050182518 | Karlsson | Aug 2005 | A1 |
20050196047 | Owechko et al. | Sep 2005 | A1 |
20050238200 | Gupta et al. | Oct 2005 | A1 |
20070019865 | Owechko et al. | Jan 2007 | A1 |
20070090973 | Karlsson et al. | Apr 2007 | A1 |
20070183669 | Owechko et al. | Aug 2007 | A1 |
20070183670 | Owechko et al. | Aug 2007 | A1 |
20070263900 | Medasani et al. | Nov 2007 | A1 |
20080033645 | Levinson et al. | Feb 2008 | A1 |
20090290019 | McNeilis et al. | Nov 2009 | A1 |
Entry |
---|
S. A. Holmes, G. Klein and D. W. Murray, “An O(NΛ2) Square Root Unscented Kalman filter for Visual Simultaneous Localization and Mapping”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Jul. 2009. |
Thomas Lemaire, Cyrille Berger, II-Kyun Jung and Simon Lacroix, “Vision-Based SLAM: Stereo and Monocular Approaches”, International Journal of Computer Vision, Sep. 2007. |
M. Kaess and F. Dellaert, “Probabilistic Structure Matching for Visual SLAM with a Multi-Camera Rig”, Computer Vision and Image Understanding, vol. 114, Feb. 2010, pp. 286-296. |
Georg Klein and David Murray, “Parallel Tracking and Mapping for Small AR Workspaces”, In Proc. International Symposium on Mixed and Augmented Reality 2007. |
WeaponWatch by Radiance Technologies http://www.radiancetech.com/products/weaponwatch.htm. |
Robot Enhanced Detection Outpost with Lasers (REDOWL) by iRobot Corporation http://www.irobot.com/sp.cfm?pageid=86&id=170. |
Kennedy, J., Eberhart, R.C., and Shi, Y., “Swarm Intelligence,” San Francisco: Morgan Kaufmann Publishers, 2001, chapter 7, pp. 287-326. |
Kennedy, J., Eberhart, R.C., and Shi, Y., “Swarm Intelligence,” San Francisco: Morgan Kaufmann Publishers, 2001, chapter 8, pp. 327-369. |
R.C. Eberhart and Y. Shi, “Particle Swarm Optimization: Developments, Applications, and Resources,” IEEE congress on evolutionary computation, CEC 2001, Seoul, Korea. |
Number | Date | Country | |
---|---|---|---|
Parent | 11800264 | May 2007 | US |
Child | 12456558 | US |