This invention relates generally to object identification, and in particular, to a system for performing object identification that combines pose determination, Electro-Optical/Infrared (EO/IR) sensor data, and novel computer graphics rendering techniques.
Many automated processes require the ability to detect, track, and classify objects, including applications in factory automation, perimeter security, and military target acquisition. For example, a primary mission of U.S. military air assets is to detect and destroy enemy ground targets. In order to accomplish this mission, it is essential to detect, track, and classify contacts to determine which are valid targets. Traditional combat identification has been performed using all-weather sensors and processing algorithms designed specifically for such sensor data. EO/IR sensors produce a very different type of data that does not lend itself to the traditional combat identification algorithms.
This invention is directed to a system for performing object identification that combines pose determination, EO/IR sensor data, and novel computer graphics rendering techniques. The system is well suited to military target cueing, but is also extendable to detection and classification of other objects, including machined parts, robot guidance, assembly line automation, perimeter security, anomaly detection, etc.
The system serves as a foundation of an automatic classifier using a model-based image processing system, including multiple capabilities for use in the overall object identification process. This includes tools for ground truthing data, including a chip extraction tool, and for performing target identification.
The system comprises two main modules. The first is a module that is intended to extract the orientation and distance of a target in a truth chip (generated using the Chip Extraction Application) given that the target type is known. The second is a module that takes the attempts to identify the vehicle within a truth chip given the known distance and elevation angle from camera to target.
The system is capable of operating in the presence of noisy data or degraded information. Image matching is actually based on synthetic image and truth chip image comparison, where the synthetic image is rotated and moved through a three-Dimensional space. To limit the search space, it is assumed that the object is positioned on relatively flat ground and that the camera roll angle stays near zero. This leaves three dimensions of motion (distance, heading, and pitch angle) to define the space in which the synthetic target is moved. Synthetic imagery generated using a simulation library can be used to help train the system.
Next the rendered synthetic image and the truth chip is rendered in order to make them more comparable. A simple thresholding of the truth and synthetic images, followed by extracting the biggest blob from the truth chip is applied to the process. The system iterates within this 3D search space to perform an image match from the synthetic and truth images to find the best score.
The process of target recognition is very similar to that used for the distance/orientation determination. The only difference is the search space. Instead of varying the target distance, heading, and pitch, the search varies the target type and the heading.
A graphical user interface (GUI) front end allows the user to manually adjust the orientation of the target within the synthetic images. The system also includes the generation of shadows and allows the user to manipulate the sun angle to approximate the lighting conditions of the test range in the provided video. Manipulation of the test sun angle is a tedious process that could also be automated in much the same way as the distance/orientation determination. The application of shadows and sun angle to the process greatly improves overall target identification in outdoor conditions.
Although this invention has numerous other applications as mentioned in the Summary, this disclosure resides primarily in new algorithms for performing the combat identification stage of the target cueing process by leveraging our existing CSMV pose determination system. The goal in this embodiment is to identify vehicle targets on the ground given the following data points:
We evaluated the use of a model-based vision system to match wire-frame models of the library of known entities against the object in the sub-image given the above target location parameters. The system tests the model at many discrete points in a 6 DoF space to get the best match. Since the 6 Degree-of-Freedom search space is huge, this leads to the requirement for significant processing power. The time required to search is also lengthy so we investigated the following methods to limit the search space:
1. Cull Based on Target Position Information
The target position parameters provided constrain the position space significantly. In order to determine how much, we need to know dR (error in distance measure) and dIJ (error in target position within the image).
2. Extract Ground Orientation to Cull Target Orientation
Because the targets are ground vehicles, we may be able to assume that they are resting on the ground with their wheels/tracks down (i.e. not turned over on their side or upside down). This significantly constrains the orientation space. If we can determine the orientation of the ground (with respect to the camera platform) then we may be able to assume that the vehicles yaw axis points towards ground-normal. If so, then two of the orientation DoFs (pitch/azimuth and roll) are constrained. Let us denote the ground orientation angle accuracies for pitch and roll respectively by dGP and dGR.
Another way to constrain the system is to eliminate targets early in the process. This approach attempts to extract the length and width of the target in order to eliminate the majority of models.
We performed a preliminary survey of a number of foreign tanks, tracked vehicles, and wheeled vehicles, as shown in
Number of vehicles=67
Number of 0.2 m×0.2 m cells that contain vehicles=50
Density in vehicles per 0.2 m×0.2 m cell=67/50=1.34
Density in vehicles per square meter=1.34*4*4=21.44
Density in fraction of vehicles per square meter=21.44/67=0.32
The distance to the target object would necessarily effect estimation of length and width based on the image. Therefore, we will represent the length and width estimation as a fraction of the distance and call this constant dLWE. If we extract height information from the source video as well, then the culling may be more effective. Variation in height is not as dramatic as in length or even width, but it can factor into the culling process (see
Based on the above calculations/assumptions we now evaluate the search space. A summary of the variables used is as follows:
FPS=Update rate of the camera in frames per second.
I=Resolution of camera
FoV=Field of View
it,jt=Target position in camera image space
dIJ=Accuracy of target position
Rt=Target range in meters
dR=Accuracy of target range (fraction of range distance)
dLWE=Accuracy of length and width estimations (fraction of range distance)
dGP, dGR=Accuracy of Ground orientation angles
We start by predicting a baseline for these values and then calculating the search space from that. Prediction of the baseline is simply an estimate on our part, but we believe that these values are reasonable.
We will also employ that rough estimate of size distribution in fraction of vehicles per square meter, which was estimated to be 0.32. Furthermore, we will assume a vehicle database size of 1000 vehicles.
From the information listed in the above table, we can now calculate the search space we must cover in terms of the possible candidate vehicles length/width envelop and the position/orientation search space that we must explore for each candidate vehicle that passes the length/width test.
Following through with the calculations, the total number of wireframe to image comparisons would be 105,630. Performance tests showed that the wireframe matching software is able to perform on the order of 10,000 wireframe comparisons per second on a 3.0 Ghz PC. This means that a database search of 1,000 vehicles, given all of the above parameters are correct, will take about 10 seconds.
Two modules were constructed to demonstrate our approach. The first was a module that was intended to extract the orientation and distance of a target in a truth chip (generated using the Chip Extraction Application) given that the target type is known. The second is a module that takes the attempts to identify the vehicle within a truth chip given the known distance and elevation angle from camera to target.
To enhance performance, we assumed that some information about the target is known. Specifically, we assumed the distance to the target would be known to within a reasonable error (we assumed 5 percent). Furthermore, the information describing the camera's relative location to the target should be known. This information was extracted from the image chips themselves by implementing a code module that uses an image-matching algorithm that essentially searches a position and orientation space to find the best camera-to-target distance and orientation.
Image matching is actually based on synthetic image and truth chip image comparison, where the synthetic image is rotated and moved through a 3-Dimensional space. To limit the search space, we assumed that the vehicle was positioned on relatively flat ground and that the camera roll angle stayed near zero. This left three dimensions of motion (distance, heading, and pitch angle) to define the space in which the synthetic target is moved.
Synthetic imagery was generated by using Cybernet's cnsFoundation simulation library. This library is able to read object models formatted in an Alias-wavefront derived format called OBJ that were converted from 3Dstudio Max files that were purchased from a company called TurboSquid1 that maintains a large repository of 3D models. CnsFoundation reads these files and then renders them using the OpenGL API which takes advantage of hardware graphics acceleration.
Once the vehicle in a given orientation is rendered using cnsFoundation, the image is extracted and piped into Cybernet's image processing suite CSCImage, which is based upon and adds to the functionality of the OpenCV2 image processing software written by Intel. Using CSCImage, we are able to process the rendered synthetic image and the truth chip in order to make them more comparable. We found that a simple thresholding of the truth and synthetic images, followed by extracting the biggest blob from the truth chip yielded the best results.
We considered the possibility of using edge images to perform the comparison. This yielded about the same results as the thresholded images. We also looked into the possibility of extracting the significant edges within these edge images, in order to significantly reduce the search space of the ATR algorithm. As seen in
By iterating within this 3D search space, we then perform an image match from the synthetic and truth images to find the best score. We were able to find the correct orientation/distance for the target vehicle approximately 50% of the time. One of the biggest problems we encountered was the presence of shadows that distorted the size of the target profiles in the truth image chips.
The process of target recognition is very similar to that used for the distance/orientation determination. The only difference is the search space. Instead of varying the target distance, heading, and pitch, the search varied the target type and the heading. For this demonstration, the number of types was 5 (i.e. the M10A2 howitzer, M35 truck, M60 tank, M113 APC, and ZSU23 anti-aircraft). At the end of the search/image-matching process, the vehicle/orientation with the best score identifies the target either correctly or not.
For those truth chips where the distance and orientation were incorrect (correctness was evaluated by manual inspection), the algorithm, as expected, did only slightly better than would a random selection of the target ID (i.e. 1 in 5). In those cases where the distance and orientation were correct, however, the ATR performed much better. The recognition rate was about 80 percent.
The results of this experiment provided information about when and why identification failed. This information could be gleaned from the input and intermediate images that were saved during execution of ATR and also from the statistical data that shows how which vehicles are commonly mistaken for others (see
A graphical GUI front end onto the system allows the user to manually adjust the orientation of the target within the synthetic images. The generation of shadows allowed the user to manipulate the sun angle to approximate the lighting conditions of the test range in the provided video. Manipulation of the test sun angle is a very manual process that could also be automated in much the same way that the distance/orientation determination is.
With shadows enabled, we were able to achieve better than 90% recognition rate (see
This application is a continuation of U.S. patent application Ser. No. 14/242,560, filed Apr. 1, 2014, which is a continuation of U.S. patent application Ser. No. 13/438,397, filed Apr. 3, 2012, now U.S. Pat. No. 8,687,849, which is a continuation of U.S. patent application Ser. No. 11/938,484, filed Nov. 12, 2007, now U.S. Pat. No. 8,150,101, which claims priority from U.S. Provisional Patent Application Ser. No. 60/865,521, filed Nov. 13, 2006, the entire content of each application is incorporated herein by reference.
This invention was made with Government support under Contract No. N68335-06-C-0065 awarded by the United States Navy. The Government has certain rights in the invention.
Number | Name | Date | Kind |
---|---|---|---|
3976999 | Moore et al. | Aug 1976 | A |
3992710 | Gabriele et al. | Nov 1976 | A |
4243972 | Toussaint | Jan 1981 | A |
4497065 | Tisdale et al. | Jan 1985 | A |
4767609 | Stavrianpoulos | Aug 1988 | A |
4772548 | Stavrianpoulos | Sep 1988 | A |
4845610 | Parvin | Jul 1989 | A |
4847817 | Au et al. | Jul 1989 | A |
4950050 | Pernick et al. | Aug 1990 | A |
4972193 | Rice | Nov 1990 | A |
5202783 | Holland et al. | Apr 1993 | A |
5210799 | Rao | May 1993 | A |
5258924 | Call et al. | Nov 1993 | A |
5324829 | Bahl et al. | Jun 1994 | A |
5339082 | Norsworthy | Aug 1994 | A |
5521298 | Bahl et al. | May 1996 | A |
5524845 | Sims et al. | Jun 1996 | A |
5566246 | Rao | Oct 1996 | A |
6042050 | Sims et al. | Mar 2000 | A |
6118886 | Baumgart et al. | Sep 2000 | A |
6351573 | Schneider | Feb 2002 | B1 |
6437728 | Richardson et al. | Aug 2002 | B1 |
6491253 | McIngvale | Dec 2002 | B1 |
6597800 | Murray et al. | Jul 2003 | B1 |
6608563 | Weston et al. | Aug 2003 | B2 |
6813593 | Berger | Nov 2004 | B1 |
6894639 | Katz | May 2005 | B1 |
7003137 | Ohta | Feb 2006 | B2 |
7006944 | Brand | Feb 2006 | B2 |
7030808 | Repperger et al. | Apr 2006 | B1 |
7040570 | Sims et al. | May 2006 | B2 |
7137162 | Spencer et al. | Nov 2006 | B2 |
7205927 | Krikorian et al. | Apr 2007 | B2 |
7227801 | Kikutake et al. | Jun 2007 | B2 |
7227973 | Ishiyama | Jun 2007 | B2 |
7274801 | Lee | Sep 2007 | B2 |
7773773 | Abercrombie et al. | Aug 2010 | B2 |
7848566 | Schneiderman | Dec 2010 | B2 |
20050286767 | Hager et al. | Dec 2005 | A1 |
20070264617 | Richardson et al. | Nov 2007 | A1 |
20080273210 | Hilde | Nov 2008 | A1 |
20090074249 | Moed | Mar 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20160093097 A1 | Mar 2016 | US |
Number | Date | Country | |
---|---|---|---|
60865521 | Nov 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14242560 | Apr 2014 | US |
Child | 14799124 | US | |
Parent | 13438397 | Apr 2012 | US |
Child | 14242560 | US | |
Parent | 11938484 | Nov 2007 | US |
Child | 13438397 | US |