The invention relates to automated media understanding systems and, more particularly, to a system and method for automatically searching for physical or graphical objects in video, images and raster data.
The number of solutions to the automated processing of video, images and raster data has grown significantly over the past 20 years. Numerous academic and patent documents have outlined approaches to various aspects of the problem. The central issue is that manual review is a poor solution to either live monitoring or forensic analysis of this data.
This technical area is characterized by systems targeted for automating the analysis of various subsets of raster data types/qualities, object types/qualities and operational scenarios. Frequently, no single system attempts to provide a framework for the predominant case, which is that, the aforementioned variables are not able to be controlled or anticipated. As a result, these so called legacy systems each perform their functions only in the context of limited data, objects and scenarios. With few exceptions, this has rendered the value of such legacy systems not high enough to gain significant acceptance, given the typically high cost of research and development. Legacy systems can be broken into four major groups; 1) Solutions developed to identify several object types in CCTV style video, which is common in ground based fixed structure video surveillance, 2) object specific solutions which focus upon recognition in a broader number of video types and scenarios, but which recognize a single or just a handful of pre-defined objects (e.g. faces, cars, license plates), 3) Systems which provide lower level image processing and basic motion detection capabilities in video, and 4) Systems focused upon various analysis of raster data which is not just RGB Video or Images.
According to one aspect, the invention features a method which includes the steps of: providing a computer based raster data search system; providing raster data; providing one or more search models as a search criteria; receiving as input the raster data; transforming by computer the raster data into a mathematical representation of an appearance of the raster data; storing the mathematical representation of said appearance of the raster data as a set of models in a database; comparing by computer at least one search model to the set of models in the database; and generating a result which indicates a likelihood of a similarity to the search criteria in the raster data.
In one embodiment, the step of providing a raster data includes providing live raster data.
In another embodiment, the step of providing a raster data includes providing pre-recorded raster data.
In yet another embodiment, the step of providing at least one search model includes providing at least one search model for a search criteria selected from the group consisting of an object, an entity, and a target.
In yet another embodiment, the step of transforming by computer the raster data includes transforming by decomposing by computer the raster data into a set of models which represent the appearance of the raster data.
In yet another embodiment, the step of transforming by computer the raster data further includes transforming pixels into a dense indexed representation optimized for search.
In yet another embodiment, the step of transforming by computer the raster data further includes transforming the raster data into a mathematical representation which is optimized for a massively parallel search.
In yet another embodiment, the step of comparing by computer at least one search model to the set of models includes a measuring of a likelihood of similarity between example-based raster data models and a transformed raster search data.
In yet another embodiment, the step of generating a result which indicates a likelihood of a similarity to the search criteria in the raster data further includes spatio-temporal likelihoods.
In yet another embodiment, the step of providing at least one search model includes providing at least one search model for a search criteria which is sourced from one or more example raster data samples.
In yet another embodiment, the step of providing one or more search models further includes providing a mathematical model representing the appearance of a user selected entity to be searched for.
In yet another embodiment, the step of providing a raster data includes providing raster data selected from the group consisting of visible light video (RGB, YUV or similar), infrared video (IR), multi-spectral, hyper-spectral, LIDAR, sonar imagery, and RADAR Imagery.
In yet another embodiment, the step of generating a result further includes an ability to detect, or reject, single or multiple objects of interest depicted in the raster data.
In yet another embodiment, the step of generating a result further includes one or more user controls to dynamically adjust post-processed search results presented to a user in a client GUI through mathematical manipulation of a likelihood of similarity measure.
In yet another embodiment, the step of generating a result further includes one or more user controls to either increase a speed of search by reducing the accuracy of the search or to decrease the speed of search by increasing the accuracy of the search.
In yet another embodiment, one or more objects of interest are selected from the group consisting of physical objects, whole frames (temporal samples thereof) of raster data, multiple frames of raster data, and any arbitrary segment of raster data.
In yet another embodiment, the step of providing one or more search models for a search criteria further includes an updating of at least one search model by submission of one or more added raster data samples.
In yet another embodiment, the step of providing one or more search model for a search criteria further includes an updating of the at least one search model by use of a computer system graphical user interface (GUI) by a positive result feedback.
In yet another embodiment, the step of providing one or more search models further includes providing recognition to the system that a result was expected.
In yet another embodiment, the step of providing one or more search models for a search criteria further includes an updating of at least one search model by use of a computer system graphical user interface (GUI) by a negative result feedback.
In yet another embodiment, the step of providing one or more search models further includes providing feedback to the system that the result was unexpected.
In yet another embodiment, the step of providing a computer based raster data search system includes providing one or more clients configured to accept a user input via a user GUI, a host communicatively coupled to the one or more clients and communicatively coupled to one or more raster data processors, the host configured to maintain communication between the one or more clients and one or more raster data processors, the one or more clients and one or more raster data processors configured to process raster data as per instructions from the host.
In yet another embodiment, the step of providing a computer based raster data search system includes providing a scalable, massively parallel processing computer based raster data search system.
In yet another embodiment, the step of providing a computer based raster data search system includes providing a cloud based architecture including a Host, Client and Raster Data Processor.
In yet another embodiment, the cloud based architecture includes one or more hosts which are responsible for managing the processing of raster data.
In yet another embodiment, the cloud based architecture includes multiple clients which are responsible for accepting user input.
In yet another embodiment, the cloud based architecture includes multiple raster data processors which are responsible for processing the raster data.
In yet another embodiment, the cloud based architecture further includes an embedded architecture.
In yet another embodiment, the cloud based architecture further includes a data transfer protocol that facilitates communication of messages between elements of a cloud based implementation.
In yet another embodiment, the communication is selected from the group consisting of a command, raster data, transformed raster data, a mathematical model, a parameter, a status indicator and a node response.
According to another aspect, the invention features a system which includes a computer system configured to operate as a computer based raster data search system. The computer based raster data search system includes one or more clients configured to accept a user input via a user GUI. A host is communicatively coupled to the one or more clients. One or more raster data processors are communicatively coupled to the host. The host is configured to maintain communication between the one or more clients and the one or more raster data processors. The one or more clients and one or more raster data processors are configured to processes raster data as per instructions from the host. The computer based raster data search system is configured to receive as input raster data, to transform by computer said raster data into a mathematical representation of an appearance of said raster data, to store said mathematical representation of said appearance of said raster data as a set of models in a database, to compare by computer said at least one search model to said set of models in said database, and to generate a result which indicates a likelihood of a similarity to a search criteria in said raster data.
The foregoing and other objects, aspects, features, and advantages of the invention will become more apparent from the following description and from the claims.
The objects and features of the invention can be better understood with reference to the drawings described below, and the claims. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the drawings, like numerals are used to indicate like parts throughout the various views.
As described hereinabove, legacy systems focused upon CCTV style video are most common. Almost invariably, assumptions about the motion of the camera (the so called camera model), are employed to estimate the appearance of an otherwise static scene, or ‘background’. These models are typically affine in nature, whereby they do not support translation of the camera. Sub-segments of the video frame that do not follow the same motion characteristic of the camera, are said to be ‘foreground’. Contiguous pixel locations of foreground are frequently called ‘blobs’. These blobs, are typically evaluated for size and shape constraints to filter out noise effects. Obviously, specific academic or commercial offerings can vary significantly from one another, but generally share the above foreground/background ‘segmentation’ step, however it may be accomplished.
While the above outlined approaches can be applied to video taken from aircraft (birds eye view) to attempt the detection of moving objects, they are typically applied towards CCTV style video, with the intent of identifying a small subset of objects. Typically vehicles, full body humans and ‘left/stolen objects’ are the only supported. Blob size, either relative or absolute, coupled with blob shape, is frequently the most heavily relied upon discriminator. Vehicles are generally the largest objects detected. Humans tend to be taller than wide and are somewhat smaller than vehicles. Left objects are objects such as bags which are left in camera view, for a certain period of time, which are then declared ‘left’, and assimilated into the background appearance model. Stolen objects are objects which suddenly leave the camera view, revealing a prior occluded portion of the background. Again, after a predefined period of time, these objects are considered ‘stolen’.
Because of the assumptions outlined above, legacy systems perform predictably within a very limited set of raster data, objects and operational scenarios. Unfortunately, due to these same assumptions, if any of the above three variables shift outside of what the legacy system was originally designed to support, behavior will be unpredictable. While a complete review of this topic is beyond the scope of this description, several common problematic scenarios are summarized below.
Lighting changes which are clearly a common occurrence will cause the scene to appear different than it otherwise does, even if there are no moving objects in it. While there are lighting models which can be applied, there is not enough information in video itself to totally compensate.
Motion of the camera itself, if it does not obey the camera model, which is a common occurrence in many use cases, will cause errors in the estimation of the background and lead to erroneous objects being detected and classified. Further, if objects present are moving while the camera is moving, in certain conditions, the objects won't be detected, leading to false negatives. If the camera pans to see an object, the object will not be detected by the system until it starts moving. If it never starts moving while in view, it will not be detected at all. If two or more objects touch, or pass in front of one another, the larger blobs will frequently be mistaken for a single object, only to be again confused when the single object appears to split apart seconds later.
Legacy systems with object specific models, approach the problem differently. Explicit modeling of camera motion and the scene are infrequent, where the systems rely upon prior trained models to gross search every portion of the video/image plane for that object. Both the systems for modeling of the specific objects, and the subsequent use of those models for identification or classification of objects can vary significantly. Most object modeling approaches are heavily data driven, requiring vast repositories of training data which are labeled manually to provide ground truth instances from which models may be trained. Due to the laborious nature of the ground truth data generation, these approaches typically focus upon a single object or a small subset of objects for recognition and classification, such as faces, vehicles, or vehicle license plates.
These legacy systems are clearly limited in their ability to identify the broad swath of objects that any user may seek to identify. Further these systems are typically finely tuned and targeted for specific applications which lends them to not be able to support a generic modeling and search system. Clearly these systems can only analyze video or images of the quality or type for which they were originally trained on. One can achieve greater breath in operational scenario, object support, or data types/qualities if one employs multiple systems. Of course, doing so requires reprocessing the video multiple times, and also requires combining the results of multiple disparate systems, the result of which is an aggregate system which without doubt has a very unintuitive operational characteristic.
Several approaches referenced in this description have sought to supplement searches of video or images with common text based mining techniques. In some cases, this can reduce user frustration, as they may understand why they system returned in appropriate results (e.g. the filename indicated content that wasn't there). Nonetheless, the ‘actual’ results will worsen, unless the original video/image based recognition is sub-adequate to begin with.
Analyses of other forms of raster data, such as LIDAR or Hyperspectral imagery, have been the target of other systems. Typically conventional approaches, outlined in the paragraphs above, have been modified to operate on this type of data. Naturally the same problems and pitfalls continue to exist.
Overall, the legacy systems reviewed in this section suffer from specific problems in an operational sense, leading generally to user dissatisfaction or inefficiency. While the core technology solutions suffer from specific problems, lack of an ability to perform at many times real-time, coupled with the fact that there is no built in ability for the user to cooperate with the system in the recognition task is the largest source of user alienation. If users are provided an ability to actively mitigate mistakes made by the systems users will be more satisfied with the solutions, and the results generated will be of higher quality.
The acquisition and production of video/image/raster data has exploded, whereby the rate of growth is increasing by all producers (individuals, corporations and governments). Quick access to production tools online has made it possible for individuals to produce video at rates reserved for corporate entities only ten years prior. Companies acquire and produce ever increasing amounts of this data for security, market intelligence and advertising purposes. Governments have a multitude of options for acquiring more and more video/image/raster data through smaller, cheaper, and higher quality cameras and raster data sensors.
Despite the growth of the amount of video, current state of the art in examination of general video/images/raster data is manual. This is despite the fact that humans are quite limited in their ability to review this data. Forcing increases in the rate of manual review instigates significant performance degradation, such that results rapidly become worse than chance, and thereby useless. Without increases in the rate of review, the amount of video that is able to be reviewed is but a fraction of that which is acquired and stored.
The quality of raster data varies dramatically and in many different capacities. Many past and current methods for automatic information extraction from this data must be employed to identify a very limited set of objects or phenomenon in a limited set of data types and qualities. So in order to analyze a set of data containing many disparate types and qualities with these legacy systems, different automated systems are required. This is problematic for several reasons, which are described below.
Legacy systems, considered individually, do not behave in ways that are intuitive to the user. It is understandable to users that objects occasionally might not be seen clearly, or if very small, may not be detected with great degrees of confidence by an automated system. It is not intuitive to users that the ‘behavior’ of the object (object moving/not moving/moving slowly/coming close to another object) will adversely affect performance when the object is in plain view and with abundant data quality. Legacy systems will perform predictably for several operational scenarios, only to rapidly degrade into random results if the operational scenario changes, seemingly only slightly.
Further, each legacy automated system has drastically different error functions and operating characteristics than other automated systems. Combination of the results of many systems such as this is extremely problematic, even if the combination is appropriately handled from a mathematical point of view. The problems stem from the fact that the behavior of the aggregate system will change unpredictably and seemingly erratically to the user. If the user is led to distrust the output of the aggregate system, they will not come to rely upon it as a useful tool.
Finally, legacy systems are not much faster at processing video than a few times real time. Lack of legacy system speed is largely due to the underlying algorithms being exceptionally hard to implement in a MPP (Massively Parallel Processing) paradigm. Given the amount of video in need of analysis, the solution required must process at least at 2 orders of magnitude (100×) real-time to meet minimum operational requirements for almost all uses of video outside of consumption for pure human entertainment.
Now turning generally to the computer based raster data search system, a computer based method for automatically processing raster data, which includes but is not limited to, visible light video (RGB, YUV or similar), IR (Infrared video), Multi-Spectral, Hyper-Spectral, LIDAR, Sonar Imagery, Radar Imagery is described hereinbelow. The method includes transforming of raster data into a dense and efficient mathematical representation which can be optimized for massively parallel search. Raster data examples can be transformed into a mathematical model representing the appearance of a user selected entity to be searched. Raster data examples can also be transformed into a mathematical model representing the appearance of a user selected entity to be searched. As described hereinbelow in more detail, a likelihood of similarity between example-based raster data models and transformed raster search data can be measured. Also, a likelihood of similarity between example-based raster data models and transformed raster search data can be measured. The likelihood of similarity measures can be used to generate search results. There can also be an ability to detect, or reject, single or multiple objects of interest depicted in raster data, which includes but is not limited to; physical objects, whole frames (temporal samples thereof) of raster data, multiple frames of raster data, any arbitrary segment of raster data. The ability to detect, or reject can include changes in scenes or produce ‘story of life’ reports based upon amalgamated search models. There can also be an updating of search models with the system GUI (Graphical User Interface), by the following methods, but not limited to; submission of added raster data samples, positive result feedback (e.g. providing recognition to the system that the result was expected), negative result feedback (e.g. providing feedback to the system that the result was unexpected). There can also be an ability provided to the user through single or multiple control(s) to dynamically adjust post-processed search results presented to the user in the client GUI, through mathematical manipulation of the likelihood of similarity measure. There can also be an ability through single or multiple control(s) to allow the user to increase the speed of search by reducing the accuracy of the search and vice-versa. The design and implementation of such a system can adhere to a completely MPP (Massively Parallel Processing) paradigm, which enables theoretically infinite scale to the system. Implementations of the system and method described herein can include a cloud based architecture, namely, a Host, Client and Raster Data Processor, whereby; there may be multiple hosts which are responsible for managing the processing of raster data, there may be multiple clients which are responsible for accepting user input and displaying system output, and there may be multiple raster data processors which are responsible for processing the raster data. Implementations can include an embedded architecture, namely, implementation of the functions of the system, or some subset, on embedded processors. Implementation can also include a data transfer protocol that facilitates communication of messages between the all elements of cloud based implementation, which includes but is not limited to; commands, raster data, transformed raster data, mathematical models, parameters, status indicators and node responses.
The computer based raster data search system described herein is capable of detecting and classifying objects represented in raster data by examples from other sources of raster data submitted by users. Users can create models to search with through a GUI (Graphical User Interface), which is part of this system. The search model is used the search input video, images or other raster data in a completely MPP (Massively Parallel Processing) paradigm. This enables the system to operate with great speed on single computers and also be readily scalable to many computers.
The system is broken into two main phases, a ‘transformation phase’, and a ‘search phase’. The ‘transformation phase’, transforms any raster data that the user desires to be searched into a dense set of models which represent the ‘appearance’ of the raster data. The ‘search phase’ employs search models, created and submitted by the user, to search the transformed raster data.
Users create search models by submission of one or more example data samples. Upon returning search results, the users may further update search models, by providing feedback on the search results. They provide this feedback by declaring a search result as ‘false positive’ or a ‘true positive’. In the case of ‘false positive’ user determinations, the system will adjust search models to not return results similar to the ‘false positive’ result. In the case of ‘true positive’ user determinations, the system will adjust search models to more favor signatures which appear similar to the ‘true positive’ result. More example data may be added or subtracted from any search model.
The system maintains a cloud amenable, multi-user, multi-threaded architecture, whereby many users can use the system's client GUIs to communicate with host processes which manage the processing executed by many raster data processing units. This cloud based architecture allows the system to be flexibly deployed in very centralized environments, semi-centralized environments, local environments and anything in between.
It should be clearly understood that like reference numerals are intended to identify the same structural elements, portions or surfaces, consistently throughout the several drawing figures, as such elements, portions or surfaces may be further described or explained by the entire written specification, of which this detailed description is an integral part. Unless otherwise indicated, the drawings are intended to be read (e.g., cross-hatching, arrangement of parts, proportion, degree, etc.) together with the specification, and are to be considered a portion of the entire written description of this invention. As used in the following description, the terms “horizontal”, “vertical”, “left”, “right”, “up” and “down”, as well as adjectival and adverbial derivatives thereof (e.g., “horizontally”, “rightwardly”, “upwardly”, etc.), simply refer to the orientation of the illustrated structure as the particular drawing figure faces the reader. Similarly, the terms “inwardly” and “outwardly” generally refer to the orientation of a surface relative to its axis of elongation, or axis of rotation, as appropriate.
While the present invention has been particularly shown and described with reference to the preferred mode as illustrated in the drawing, it will be understood by one skilled in the art that various changes in detail may be affected therein without departing from the spirit and scope of the invention as defined by the claims.
This application claims priority to and the benefit of co-pending U.S. provisional patent application Ser. No. 61/647,547, SYSTEM FOR SEARCHING RASTER DATA IN THE CLOUD, filed May 16, 2012, which application is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61647547 | May 2012 | US |