The technology herein relates to object detection and classification, and to devices that can autonomously detect and classify human and non-human targets. More particularly, the technology herein relates to horizontal velocity profiling and other methods and apparatus that may be used to detect and classify moving targets, including but not limited to a person, an animal, or a vehicle, or any other object that lends itself to characterization.
Much work has been done in the past to automatically detect and classify objects. For example, automatically detecting intruders and distinguishing a human intruder from a large animal such as a deer or dog has long been a goal of many security and defense systems. Hunters have long sought systems that can distinguish between an acceptable target such as a deer or a bear and an unacceptable target. Known solutions are able to detect the presence of moving targets but few are able to classify and confirm the moving target autonomously, i.e. without human intervention to confirm that the classification is correct. Some known technologies track warm targets using infrared sensors through limited space, but such solutions typically assume that the warm target is human and not a large animal. Other challenges in conventional human detection technologies are their large power consumption, complex detection algorithms, expense, and dependence on specific sensors.
The technology herein relates to techniques, methods and systems that may be used to detect and classify moving targets, including but not limited to a person, an animal, or a vehicle, or any other object that lends itself to characterization. Such techniques, methods and systems may be implemented with an autonomous stand-alone device, for example, as an unattended ground sensor, or it may constitute part of a sensor system. In either case, an exemplary illustrative non-limiting implementation allows the device to be fixed to a location, while detecting and classifying moving targets. In another exemplary illustrative non-limiting implementation, the device may be placed on a moving or rotating platform and used to detect stationary objects.
In one exemplary illustrative non-limiting implementation providing a method and device wherein the device is located on a stationary platform at a fixed location, the device includes but is not limited to a detector, an optical component, a microcontroller, a memory component, a search engine, a power source, and a method or component to send data to a main processing unit or center for further analysis. An exemplary illustrative device's operation may described as follows. Initially, the detector operates at low sample rate. As the moving target enters the detector's field of view limited by the optical component, the detector senses a change and increases its sample rate to record data useful for classification. Once desired data is recorded, the microcontroller reduces the detector's sample rate. The search engine compares the recorded data with the sample target data stored in the memory component, and either finds a match and identifies it with a known type of target or dismisses it as an unknown target. If the target is classified, the information relating to the type of target can be transmitted to a central processing unit or stored on the device for later processing. A multiplicity of such devices may be distributed in an area to monitor the type and corresponding occurrence of previously specified moving targets. The resulting data may be tailored to or used by any desired application.
Exemplary, non-limiting advantageous features and advantages provided by illustrative exemplary non-limiting implementations include:
These and other features and advantages will be better and more completely understood by referring to the following detailed description of exemplary non-limiting implementations in conjunction with the drawings. The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee. The drawings are briefly described as follows:
The exemplary illustrative non-limiting device may be used by itself or as part of a group distributed throughout an area. A non-limiting example of the device by itself and as part of a group of similar or dissimilar devices is shown in
As shown in
The detector 106 could be a focal plane array, i.e. a two-dimensional detector array or a linear array. Depending on whether the device should operate at both at night and during the day or on the target type, the detector may be sensitive in the infrared, the visible area, a combination of the two, or a different range of the spectrum.
The optical component 108 may be added to limit the detector's field of view. Optics 108 may for example provide an individual lens or a combination of optical devices, such as a slit or a grating in addition to one or more lenses as known to those familiar with the art. The memory device 110 can be a combination of volatile and non-volatile memory and located on a dedicated memory board or partially integrated with one or several microprocessors 116 to form the search engine. There are several types of memory commercially available that satisfy the requirements of this device, including but not limited to SD cards and flash memory. The memory 110 serves to store the data as well as the library of identifiable targets. An additional benefit of non-volatile memory is that it allows a potential user to remove the data without powering up, operating the device, or communicating with it from a main processing unit.
The power component 112 may consist of batteries or renewable energy forms with a processing circuit to illicit the voltages and currents and provide sufficiently clean signals for proper operation of the device's components. The transmitter 114 enables the device to send its data via a wireless connection or a wired connection. The particular type of transmitter used as part of this device may be specific to the application, and depends on the size, speed, and frequency of data transmissions and are known to those familiar with art.
The microcontroller 116 both holds and executes the software 104. The exemplary illustrative non-limiting implementation includes at least one dedicated microcontroller for the detector 106 and a combination of microcontrollers 116 to support the use of the memory components 110. The software 104 provides the methods of detection and classification 118 and other processes 120, for example, the communications between the various hardware components and determining when certain components are operated at low-power levels.
The following section describes an exemplary illustrative non-limiting method for detection and classification of a target. The algorithm, a block diagram of which is shown in
During initialization 150, the detector samples the field of view at a low frequency (to reduce power consumption), looking for a change in the background. A non-limiting exemplary change could also occur when one or several detector elements exceed their thresholds. Since the detector is only looking at one vertical line and the device is stationary, a sudden change in some detector elements indicates the appearance of a moving object.
Once a potential target has been detected (block 152), the device switches to the data-collection mode (block 154) by increasing its sample frequency and recording the image until no change in the background is detected (consuming more power). As part of the noise removal phase (block 156), the device removes noise generated by moving leaves and branches, power lines, etc., and normalizes the image for distance and velocity. It then compares the image to a library of identifiable targets and classifies the object upon a match (block 158). Subsequently, the target type and number are stored and transmitted to a central processing unit (block 160), after which the whole process repeats itself (block 150).
A particularly advantageous and salient feature of this method of an exemplary illustrative non-limiting implementation is the recognition that the velocity components measured by the device are uniform across most human movements and that the library of such for a given target is relatively small. Additional advantageous features of an exemplary illustrative non-limiting implementation include reduced power consumption as a consequence of (1) reducing the sample rate of the detector and (2) sampling only one line of the detector or using a linear array as a detector.
The black conglomeration of pixels in
In the following sections, some of the hardware components, the device's performance and their effects on the ability of the device to operate in different environments and applications are discussed in more detail.
The detector 106 may consist of one or a combination of infrared or visible imagers. Non-limiting illustrative examples of visible imagers are CMOS and CCD cameras. Non-limiting examples of infrared imagers are pyroelectric arrays and microbolometers. There are many characteristics of the detector that govern its performance, but one that is common to the entire energy spectrum is its resolution. For the autonomous implementation of the method, an exemplary illustrative non-limiting detector 106 may have for example:
To optimize the detector 106's performance with optics 108, it may be helpful to understand the parameters of a lens.
Both of the lens' focal points are drawn. Which one is applied, depends on which side the object is located and on the type of lens. The distance of the object from the lens so is equal to the sum of xo and f; similarly the distance of the image from the lens is the sum of xi and f. By relating the parameters through similar triangles, the following relationships are derived:
MT is the transverse magnification factor and ML is the longitudinal magnification factor. After applying these formulas to meet the specifications above, placing a human object, height 2 m, 100 m away from the lens, with a CMOS camera attached we obtain the results listed in Table 1 below. For example, cell phone camera lenses typically have a focal length of around 3.5 mm.
Another figure of merit often encountered when selecting a camera is the “f-number” f/# or focal ratio. It is defined as the ratio between the focal length and the diameter of the aperture in front of the lens. For example, f/2 means that the f/2=2 and that the focal length is twice the size of the aperture. A smaller f-number permits more light to reach the image plane. The size of the f-number is inversely proportional to the exposure time will be, ie. a large f-number will shorten the exposure time.
The example implementation of device 100 senses high frequency changes in ambient lighting conditions and as such may periodically adjust these values for changes in sun position, which change the shadows, and weather. These phenomena occur relatively slowly (generally >10 seconds in the near field) opposed to the passage of animals and vehicles that in duration are generally less than 10 seconds. The exemplary illustrative non-limiting system may average its ambient levels over time so that the following conditions are met:
1. An average light balance is maintained. The sensor's shutter speed or lens opening can be adjustable by the processor to maintain the proper exposure levels for current conditions.
2. Low frequency radiation changes can be filtered when no targets are present. Reasonable targets should preferably not be filtered from the analysis stream in the exemplary illustrative non-limiting implementation.
3. High frequency noise should be filtered where possible (moving trees for example). One possible solution is to adjust the view frame to exclude scene portions that cannot have targets (the sky, tree tops, flags, roads, etc.) or are likely noise sources.
Typical use of these detector has them positioned perpendicular to likely target paths with an initial setup to establish an ambient baseline. This can be corrected continuously (or at frequent intervals) for the sensor device's lifetime.
1. Sensor motion. The wind (and experimenter) can cause the display to which the camera is mounted to oscillate somewhat resulting in horizontal noise particularly in areas with high frequency color changes and less so in low frequency areas such as a blue sky.
2. Wind. Even a low velocity wind (˜5 mph in the following) generates significant noise as shown (see
a. Filter the accumulated image (perhaps on the fly)
b. Locate any potential objects (large blobs)
c. Subtract the filtered image except near the object
d. Reacquire the object
An alternative approach establishes an average and standard deviation for each pixel. The image is generated wherever the distance between the pixel value and the mean exceeds some multiple of the standard deviation.
3. Shimmer. This can be a particular problem in hot areas with high contrast ground components causing differential heating. Objects walking across black top highways or near field of view objects can generate random sparkle. The problem is exacerbated by objects saturating the camera's white range as the CMOS sensors become more sensitive with more light. The system can tune itself in the presence of temperature extremes. In
4. Dust. Dust trailing an object easily becomes part of its profile depending upon its contrast with the background. This has proven to be less of a problem with humans as their ability to create dust is more limited than large machines. In
5. Detector Noise. This comes in various sorts and, for the most part, can be dealt with by careful electronic design. However, much of this is white noise and is easily removed by setting the contrast change threshold above the noise level.
A potential target is signaled in the exemplary illustrative non-limiting implementation by the following non-limiting criteria:
1. Sufficient neighboring cells that have a value change from ambient energy (visible, infrared)
2. The changes last for a sufficient time
3. The changes do not last for too long a time (signals an approaching object or change in ambient levels).
The target acquisition is complete when these non-limiting criteria are no longer perceived by the detector.
Another non-limiting illustrative implementation of the method in the form of a state machine is shown in
In some of the experiments performed, each object was positioned in the field of view and the entire sequence of data (usually between 50 and 200 frames worth) was processed to isolate the target.
Once the target's time frame has been extracted, the data is also restricted to the target's vertical limits in space, and any spots occurring earlier than the target's time frame that might not have triggered the capture are added back. The following is an illustrative non-limiting example algorithm shown in
1. Build a list of all 8 connected areas in the block identified previously (
2. Sort these blocks into order from largest to smallest (or in some other order) (
3. Build a super block by taking the first of these blocks and computing the minimum distance to any remaining blocks (this is an O(n2 m2) operation where n is the number of blocks and m the average size of each block). This can be repeated until no more blocks can be added. The naïve algorithm can certainly be improved. (
4. If additional sufficiently large blocks remain, form them into super blocks as well (block 256). In this fashion, multiple targets can be acquired from a single sequence and all compared against the library. Insufficiently large blocks are removed from consideration as probable noise. Small areas of change not near enough to a large block are also discarded.
5. Compute the minimum bounding box and reconstruct a bit array of this size (block 258).
6. Resample the bit array to a standard size and keep track of the resample sizes (block 260).
For example,
Problems with Objects
Any approach separating objects from the environment can have trouble with the following:
1. Occlusion. One object can be hidden from or be part of an object between it and the object in front. For example,
2. Noise. Trees, wires, birds, butterflies and other random objects generate noise that may cause objects to be attached to each other though they are separate.
3. Shadows. At certain sun angles, shadows may connect the objects into a single one particularly if they are moving close together.
The catalog of known human forms may consume considerable space. However, this catalog will be considerably smaller than might be expected if we match static pictures. Analysis shows that basic human motions involving horizontal movement are stereotypical—only a relatively few HVPI's (horizontal velocity profile images) characterize crawling, walking, and running.
The HVPI catalog can be created in a laboratory setting and images stretched to some basic size related to the target object's typical aspect ratio. For example, humans are (usually) taller than wide, and a two or three to one aspect ratio seems to effectively capture walking at reasonable speeds.
There are two target types in the library: those that cause an alarm and those that don't. The characteristics of good catalog entries include the following:
1. Sufficient detail can be stored to minimize false alarms and rejections.
2. The number of images increases the search time linearly.
3. It is best to include non-alarm targets as well—a positive identification of such reduces the false alarm rate and reduces the reliance on a fixed target match threshold.
4. Targets that do not change their shape with time need only a few images (e.g. a few different kinds of cars, trucks, vans, SUVs, motorcycles, front loaders, etc,).
5. The search time is limited by the alarm target's maximum velocity at the nearest possible position.
Some exemplary illustrative non-limiting implementations can use among other things a very simple comparison. For a target T size n×m and the functions 0≦X(i,n,m)<32 and 0≦Y(j,n,m)<64, and a catalog entry Li,j, we compute a match 0≦V<1 by:
We can count the number of pixels that are identical and normalize to the image size. While this algorithm appears to have reasonable results, it may also have a number of drawbacks such as:
1. It forces all images into a fixed aspect ratio not justified by the real object's appearance.
2. For slow, large, or far away objects, the X and Y functions tend to remove essential detail. Using an X′ and Y′ inverse functions would increase the search time for large, slow, or far away objects.
It is possible to include some additional characteristics that can be used to improve the search speed. To minimize power consumption and speed processing, these should preferably be relatively simple. Illustrative non-limiting examples include:
1. Aspect ratio. Humans can only walk so slow and run so fast. A low aspect ratio HVPI indicates either a human approaching the sensor or a very large object. Setting an aspect-ratio range gate when comparing two HVPIs may reduce the likelihood of generating a match between a large object and a slow moving human. Humans approaching a sensor will be ignored; if a human target approaches one sensor, an alternative sensor nearby in the network will see the target in profile and report.
2. HVPI (horizontal velocity profile image) Density. Pixels representing a detection or a change are ON and pixels that do not represent a detection or a change may be considered OFF. Assuming the horizontal velocity profile image is normalized to the size of the library image, the ratio of HVPI pixels ON to pixels OFF may be a simple initial matching mechanism that reduces search time.
The image catalog can be generated from videos taken in a laboratory or other setting. We can use high quality video capture of subjects in a high contrast setting. There will be minimal background noise and subjects will be selected from those most common:
1. Humans walking, running, and crawling. Depending upon the target audience, they may also be carrying backpacks or weapons. Obese targets may also be recorded if identification of their movements is sufficiently different from that of average-sized human profiles.
2. Quadrupeds walking and running. Likely subjects include dogs and horses as they are sufficiently trainable to walk in front of the recording device. We believe that horses are sufficiently similar to cows and deer as to provide positive identification.
3. Machinery. We can record cars, trucks, vans, SUV's, bicycles, motorcycles, and other common equipment. Though there are many such devices, they rarely exhibit geometric changes during passage (bicycles being the exception) and therefore require a minimal number of images in the catalog. Positive identification of these objects generally implies human presence.
The library of velocity profiles can be relatively small yet a wide variety of moving targets can be correctly classified. Several approaches are possible:
1. Store only human profiles. We compare a captured velocity profile against all of these and if a threshold is reached the target is human.
2. Store most likely profiles. If we also include cars, trucks, and various quadrupeds these will more likely be matched and the threshold for identification can be lowered.
To help understand this problem, we compared two subjects by their foot phases during a walking sample. In
These results indicate that there are two or perhaps three characteristic HVPI poses characteristic of human walking. Examining movies constructed from multiple HVPI's confirms this.
When we run the same comparison between a dog and a human, the numbers are much lower than comparison between humans. Very few frames match sufficiently to trigger a match—this graph does not indicate that a better match could be found against a human (
Experiments can be run to determine how many images should preferably be stored for positive human identification.
Most humans are about 5′10″ tall and are traveling at between 3.5 km/h (slow walk) and 8 km/h (fast walk), 1 km/h crawling, and 15 km/h running. This allows us to estimate distance and velocity from the target's size. When we locate a target, we resample it to the standard catalog size and retain the scale factor required to do so.
Consider the following example (
Distance can be calculated in a similar fashion. Assuming that all targeted humans are about 1.8 meters tall and knowing the distance used to create the catalog human allows a simple, if inaccurate, calculation for distance. For example, two samples of different experimenters (about the same height) walking, were taken at 10 and 100 meters respectively (
The average height (in pixels) of the subject 10 meters from the camera is 192 pixels (at fast walking speeds, this reduces to about 184 pixels). The subject 100 meters away from the camera averages about 21 pixels. Assuming that the image sensor is about 5 mm tall, and simple projections, we compute the distance as about 91 meters an error of approximately 10%.
We recorded various subjects with video cameras that could record digital images. These included the Photron high-speed, black and white, or color video cameras and the webcam on a Macintosh laptop. Targets ranged from 30′ to 130′ distant and included walking and running individuals, dogs, bicycle riders, and cars. Backgrounds included an off-white painted warehouse, a suburban street, and a city park. Specifically:
The targets generally moved perpendicular to the camera while a number of frames were collected and stored. These frames were converted to a 128 level grey scale Microsoft BMP files for analysis. The color maps varied somewhat from frame to frame and can be normalized by the exemplary non-limiting analysis program. This proceeded as follows:
1. Select one column.
2. Compute the column's mean and standard deviation grey value (0→1) for the first few frames to establish a background ambient value.
3. For all subsequent frames, compute a bit vector with a 1 where the corresponding value differs by more than some multiple of the standard deviation of the background values.
4. Display the bit vectors in two dimensions.
For example, one frame of a slow walker is shown in
When the bicyclist speeds up, the picture is compressed along the horizontal axis. At the same location, going about twice as fast, the result in
Similarly, we can approximate the distance by size similar to the way our eyes perform. In
We are also able to tell some gross physical characteristics such as whether or not the subject is wearing a back pack (
While the technology herein has been described in connection with exemplary illustrative non-limiting implementations, the invention is not to be limited by the disclosure. For example, while the exemplary illustrative implementation may be applied to many different moving target types, it has been useful to consider the human as an example moving target. Conversely, the exemplary device described herein classifies a human target by a velocity profile as measured by the detector, a linear sensor array or one column or row of a two-dimensional sensor array but other sensors are also possible. The device may also be referred to as the Horizontal Velocity Profile Sensor (“HVPS”) and the analysis it performs as “Horizontal Velocity Profiling” (“HVP”). Thus, the invention(s) is/are intended to be defined by the claims and to cover all corresponding and equivalent arrangements whether or not specifically disclosed herein.
This application claims the benefit of provisional application No. 61/180,348 filed May 21, 2009, the contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61180348 | May 2009 | US |