(1) Field of Invention
The present invention relates to searching visual imagery and, more specifically to a system for intelligent goal-directed search in large volume visual imagery using a cognitive-neural subsystem.
(2) Description of Related Art
The present invention relates to video image analysis. Previous methods of video image analysis fall into two categories: human vision and computer vision. Human vision methods locate regions of interest by systematically or randomly manually scanning the fovea (narrow fields of view) over the image area. In the case of a magnifying optical system such as binoculars, the human manually scans the optical system over a wider field of view (FOV). For an expert in surveillance and reconnaissance this can take over 5 minutes for a 120 degree FOV region. Furthermore, this process is limited in range by the effective optical magnification of the system. Humans are also more likely to make errors during prolonged or difficult tasks due to fatigue.
Recently the field has seen the emergence of neural or “brain-in-the-loop” image analysis methods which analyze static, previously-acquired imagery using electroencephalography (EEG). These neural methods are limited to sequential presentation of pre-selected image chips followed by manual inspection, and are also limited by human fatigue during long presentation sessions.
Computer vision methods, on the other hand, have been developed to automatically detect objects of interest based on large numbers of sample training data. These computer vision methods are prone to error and typically useful only in previously known conditions and for previously determined small numbers of objects of interest.
A small number of previous methods have used human neural methods that are based on the Rapid Serial Visual Processing (RSVP) paradigm. RSVP is often referred to as an oddball task since the brain elicits a particular response to a novel stimulus. This response is called the P300 response and occurs 300 milliseconds after stimulus. In an RSVP method, images of a specified target object of interest and background distractor images not containing the target are both captured and stored. The target objects typically must comprise no more than 25% of the image area, be large enough to be visible, be placed near the center of fixation, and be interspersed at a pre-specified rate. Being interspersed at a pre-specified rate typically requires far fewer target images than distractor images (e.g., 2 per 100, called the target probability), as well as proper interspersing of the target, called the target-to-target interval. RSVP then presents images to the operator at a rate 5 to 10 images per second. Single trial EEG recordings at specific scalp locations are made and processed using simple linear pattern classifiers or multiple classifier systems. These classifiers must be trained on large numbers of representative data similar to the data to be analyzed, and over varying time window intervals (e.g., 2500 sample images, with 50 target images and 2450 non-target images). In practical situations, RSVP approaches require the user to focus on flashing images for 20 to 30 seconds or more, depending on how many regions of interest exist in the image sequence. These existing RSVP approaches are not capable of inspecting live imagery, and do not support the use of Cognitive algorithms in the system architecture.
Thus, a continuing need exists for a cognitive-neural system for large volume image analysis which incorporates both human and computer components and is capable of analyzing live and stored imagery.
The present invention relates to searching visual imagery and, more specifically to a system for intelligent goal-directed search in large volume visual imagery using a cognitive-neural subsystem. One aspect of the system comprises an imager, a display, a display processor, a cognitive-neural subsystem, a system controller, and operator controls. The imager is configured to produce an image of a scene. The display is configured for displaying the image to an operator. The display processor is configured to assemble the image and control an appearance of the image as seen by the operator on the display. The cognitive-neural subsystem is configured to locate regions of interest in the image. The cognitive-neural subsystem comprises a cognitive module and a neural module. The cognitive module is configured to extract a set of regions of interest from the image using a cognitive algorithm. The neural module is configured to refine the set of regions of interest using a neural processing algorithm. Furthermore, the operator controls are configured to allow the operator to select from a plurality of operating modes and to navigate the displayed image. Finally, the system controller is configured to control system states and power usage, and to manage interfaces between the imager, display, display processor, cognitive-neural subsystem, and operator controls.
In another aspect, the cognitive neural subsystem further comprises an adaptation module configured to bias the cognitive algorithm with information gained from the neural processing algorithm.
In yet another aspect, the system is mounted in a pair of binoculars.
In a further aspect, the system is configured to operate in a batch mode. In batch mode, portions of an image containing a potential region of interest are pre-selected using the cognitive algorithm. The pre-selected portions of the image are then displayed to the operator using Rapid Serial Visual Presentation (RSVP). An operator response to the pre-selected portion of the image is measured using Electro-encephalography (EEG). When the pre-selected portion of the image yields a high EEG response, the portion is presented to the operator for operator-controlled visual inspection and validation.
In another aspect of the present invention, the system is configured to operate in a semi-interactive mode. In semi-interactive mode, a portion of an image containing a potential region of interest is pre-selected using the cognitive algorithm. The user is then presented with a reduced-resolution image where the pre-selected portion of the image is highlighted. The operator's gaze is tracked using eye-tracking, such that when the operator's gaze crosses the pre-selected portion of the image, the portion is displayed in a full resolution view for visual inspection by the operator. An operator response to the pre-selected portion of the image is measured using Electro-encephalography (EEG). The pre-selected portion of the image is validated if it triggers a high EEG response.
In yet another aspect, the system is configured to operate in a real-time mode. In real time mode, the operator is presented with an image for visual inspection. The operator's gaze is tracked using eye-tracking. A portion of the image is extracted based on the location of the operator's gaze. The extracted portion of the image is processed using the cognitive algorithm to identify whether the portion contains a potential region of interest. The extracted portion of the image containing potential regions of interest is then presented to the operator for visual inspection. An operator response to the extracted portion of the image is measured using Electro-encephalography (EEG). When the extracted portion of the image triggers a high EEG response, the portion is marked and stored for later validation. The stored portion of the image can then be presented to the operator for operator-controlled visual inspection and validation.
In a further aspect, the system is configured to operate in a roaming mode. In roaming mode, the operator is presented with an image for visual inspection. The operator's gaze is tracked using eye-tracking. A portion of the image is extracted based on the location of the operator's gaze. An operator response to the extracted portion of the image is measured using Electro-encephalography (EEG). When the portion of the image triggers a high EEG response, the portion the portion is marked and stored for later validation. The stored portion of the image can then be presented to the operator for operator-controlled visual inspection and validation.
The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where:
The present invention relates to searching visual imagery and, more specifically to a system for intelligent goal-directed search in large volume visual imagery using a cognitive-neural subsystem. The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of embodiments. Thus, the present invention is not intended to be limited to the embodiments presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is only one example of a generic series of equivalent or similar features.
Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of” or “act of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.
Further, if used, the labels left, right, front, back, top, bottom, forward, reverse, clockwise and counter clockwise have been used for convenience purposes only and are not intended to imply any particular fixed direction.
Instead, they are used to reflect relative locations and/or directions between various portions of an object.
The following references are cited throughout this application. For clarity and convenience, the references are listed herein as a central resource for the reader. The following references are hereby incorporated by reference as though fully included herein. The references are cited in the application by referring to the corresponding literature reference number.
The present invention relates to searching visual imagery and, more specifically to a system for intelligent goal-directed search in large volume visual imagery using a cognitive-neural subsystem.
The cognitive-neural subsystem 102 routes the imagery to a display processor 104. The display processor 104 assembles the image and controls the zoom factor and appearance of the output imagery that an operator sees on a display 106. The display processor 104 also provides image stabilization on reduced resolution imagery. Furthermore, the display processor 104 acts as a general purpose processor for processing image data provided by the various subsystems of the device, such as performing internal manipulations of prioritized lists of regions of interest in the image. The display 106 may comprise any of a variety of display types known in the art, including but not limited to micro, head-worn, handheld, desktop, large screen, or direct-view displays. In a desired embodiment, the entire system is mounted in a pair of head-worn binoculars.
Operator controls 108, including an eye-tracking unit 110, enable the user to choose from a variety of operating modes as well as training and adaptation modes. The eye tracker 110 is a special form of operator control for use with certain operational modes described later in this description. The eye tracker 110 can be used in conjunction with electroencephalography (EEG) data obtained from an EEG interface 112 with the operator's brain via head-mounted EEG sensors 114. The operator controls 108 can also include more standard controls, non-limiting examples of which include a mouse, trackball, and touch-screen.
The system has a system controller 116 configured to control system states and power usage. The system controller 116 also manages interfaces between the imager 100, display 106, display processor 104, cognitive-neural subsystem 102, and operator controls 108. An A/D unit 118 converts an analog signal from the imager 100 into a digital form for use within the other system modules. Additionally, the system controller 116 can accommodate navigational devices such as, but not limited to a laser rangefinder and steering mechanism 120, an external GPS device 122, and a digital compass 124.
(3.1) Batch Mode 300
In batch mode 300, the cognitive module pre-selects a portion of an image containing a potential region of interest using a cognitive algorithm 310. The pre-selected image portion may be a single image portion or one of a plurality of image portions from the image being analyzed. The system can also process a sequence of image portions as in the case of live video imagery. The pre-selected image portion is then displayed to an operator using Rapid Serial Visual Presentation (RSVP) 312, and for each image portion displayed via RSVP, an electro-encephalography (EEG) 312 reading is taken to measure the operator's neural response to the image portion. For background regarding RSVP theory and practice, see [1] and [2] in List of Cited References section, above. Each EEG response is classified using a learning classification algorithm. The output score from this algorithm ranges typically between 0-1, for the responses of interest (e.g., an item of interest or No Item of interest). High classification scores indicate a high likelihood of belonging to that class and vice versa. This is a common practice in all classifiers and a threshold can be chosen to achieve a desired trade-off between true detections and false alarms to generate traditional ROC (Receiver Operating Characteristics) performance curves. When an image portion yields a high EEG response, the image portion is presented to the operator for manual inspection and validation 314. The image portions may be presented to the operator serially, or as outlined regions within the full image, in which case the operator can select portions to view in greater detail discretionally. The validated image portions can then be output 308 to the adaptation module 204 where algorithms determine the type of object and bias the parameters of the cognitive algorithms 310 to improve performance in future use. Conventional algorithms for object classification can be used to determine the type of the object. Given an object in an image chip, algorithms using particle swarm technology as described in [3] and [4], can be used to classify the object.
(3.2) Semi-interactive Mode 302
In semi-interactive mode 302, the cognitive module pre-selects portions of the image containing potential regions of interest using a cognitive algorithm 310. As with batch mode (above), the pre-selected image portion may be a single image portion or one of a plurality of image portions to be analyzed. The system can also process a sequence of image portions as in the case of live video imagery. The operator then performs real-time EEG image chip inspection 316, where the operator is presented with a reduced-resolution image with the pre-selected portions of the image highlighted. The operator's gaze is tracked using eye tracking, such that when the operator's gaze crosses the pre-selected portion of the image, the portion is displayed in a full-resolution view for visual inspection by the operator. The operator's neural response to the pre-selected image portions is measured using EEG readings. Image portions which trigger a high EEG response are validated as ROI. The validated image portions can then be output 308 to the adaptation module 204 where algorithms determine the type of object and bias the parameters of the cognitive algorithms 310 to improve performance in future use. Non-limiting examples of parameters of the cognitive algorithms include intensity, color and contrast sensitivity, local gradient measures, and motion sensitivity. The semi-interactive mode 302 can be used in conjunction with the RSVP functions of the batch mode 300 if desired.
(3.3) Real-time Mode 304
In real-time mode 304, the operator first performs real-time sub-image selection 318. For real-time sub-image selection 318, the operator is presented with real-time images for visual inspection. The real-time images can be analyzed as a temporal sequence of individual image frames. In a desired embodiment the operator initially views a reduced-resolution view of the image stream. The operator's gaze is tracked using eye-tracking, and portions of the image are extracted based on the location of the operator's gaze. The extracted portions are processed by a cognitive algorithm 310 to identify whether the image portion contains a potential ROI. Processing by the cognitive algorithm 310 can occur in real-time. The cognitive algorithms send the image coordinates of the image chips that contain potential ROI's at full resolution to the system controller 116. The system controller sequences through each potential ROI, causing the display processor 104 to zoom in on the ROI in the image. This occurs rapidly and seamlessly so the operator knows exactly where in the image chip an ROI may exist. The extracted portions containing potential ROI are then presented to the operator for real-time EEG chip inspection 316, where the operator's response to each image portion is measured with EEG readings. Those image portions which trigger high EEG response are marked and stored for later validation. The image chips can then later be presented to the operator for operator-controlled visual inspection and validation. The validated image portions can then be output 308 to the adaptation module 204 where algorithms determine the type of object and bias the parameters of the cognitive algorithms 310 to improve performance in future use.
(3.4) Roaming Mode 306
In roaming mode 306, the operator first performs real-time sub-image selection 318. For real-time sub-image selection 318, the operator is presented with real-time images for visual inspection. The operator's gaze is tracked using eye-tracking, and portions of the image are extracted based on the location of the operator's gaze. The extracted portions containing potential ROI are then presented to the operator for real-time EEG chip inspection 316, where the operator's response to each image portion is measured with EEG readings. Those image portions which trigger high EEG response are marked and stored for later validation. The image chips can then later be presented to the operator for operator-controlled visual inspection and validation. The validated image portions can then be output 308 to the adaptation module 204 where algorithms determine the type of object and bias the parameters of the cognitive algorithms 310 to improve performance in future use. As previously mentioned, the cognitive-neural subsystem including the adaptation module is the subject related U.S. patent application Ser. No. 12/316,779.
In many of the above modes, the user can search the imagery or video using a coarse-to-fine resolution strategy, and the system provides intelligent inspection and zooming advice continuously. The user operates as though they have a manual point-and-zoom device while eye tracking or other ROI selection means determines the user's gaze location on the image. Cognitive algorithms process at higher levels of resolution, ahead of the user, to find potential ROI's around the current gaze location. These potential ROI's effectively cue the user to search specific image regions with a higher probability of containing ROI's at increased resolution but is not visible at sufficient resolution on the display to enable the user to detect them. Neural classification of the contents of each image visible to the user enables relevant ROI's to be stored at any stage.
The user can choose from either a continuous zoom or discrete zoom method. Continuous zoom uses an image location determined by gaze or pointer to initialize the image rectangle to display at the maximum resolution of the display screen. Given a zoom rate, the system grabs a slightly smaller region of the current displayed image at the next level of higher resolution, and sub-samples it to fit the image onto the display. The user's gaze direction or input from the cognitive module (e.g., a potential region of interest) determines the center of this region as described above. This process repeats until the user decides to back out of the zoom or indicates completion (e.g., through a button press). At each stage of the zoom, an image chip from the central region of the user's gaze is available for neural processing. A discrete zoom system, which uses a preset number of image resolutions to choose from, can also be used with the present invention.
An illustrative diagram of a computer program product embodying the present invention is depicted in
This application is a Continuation-in-Part application of U.S. patent application Ser. No. 12/316,779, filed on Dec. 16, 2008, and titled “COGNITIVE-NEURAL METHOD FOR IMAGE ANALYSIS.”
Number | Name | Date | Kind |
---|---|---|---|
20100185113 | Peot et al. | Jul 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
Parent | 12316779 | Dec 2008 | US |
Child | 12589052 | US |