The following relates to the Business Process Management (BPM) arts, computer vision (CV) arts, and related arts.
Video cameras are ubiquitous at many commercial sites, government facilities, non-profit organization worksites, and the like. Video cameras are commonly used for diverse tasks such as security monitoring (i.e. “security cameras”), facility usage monitoring, traffic enforcement, video identification systems (for identifying persons or objects), manufacturing article inspection (e.g., “machine vision” systems used for quality control purposes), and so forth. Video cameras are powerful devices because they acquire tremendous amounts of data in a continuous fashion (e.g. 30 frames/sec in some commercial video cameras), and because video mimics visually oriented human perception.
However, video cameras have some disadvantages as monitoring tools. Complex image and/or video processing is usually required in order to extract useful information from the continuous video data stream. Moreover, the close mimicking of human perception can, paradoxically, be deceptive as video content can be misinterpreted by a human viewer. For example, it is known that human visual perception tends to detect faces and human shapes in video content, even where none are actually present. Shadows or other lighting artifacts can also be misinterpreted. The nature of video analysis also tends to be statistical and uncertain, as statistical image classification techniques are usually employed to detect persons, objects, or other features of interest.
In view of these difficulties, automated computer vision systems tend to be restricted to narrowly tailored tasks. For example, automated computer vision systems are used in manufacturing production lines, where the camera can be precisely positioned to image products passing through the production line from a specific vantage point. Automated camera-based traffic enforcement is also common, where again the camera can be precisely positioned to image the vehicle (and more particularly its license plate) in a consistent way from vehicle to vehicle. Repurposing of such narrowly tailored video systems for other tasks is difficult.
For more complex tasks, or for tasks having low margin of error, automated systems are typically eschewed in favor of manual monitoring of the video feed. For example, a security camera feed is commonly observed by a security guard to detect possible intruders or other security issues. Manual approaches are labor-intensive, and there is the potential for the human being monitoring the video feed to miss an important event.
In sum, although video cameras are commonly available input devices, they are difficult to reliably leverage for diverse applications. Automated video monitoring systems tend to be single-purpose computer vision systems that are not amenable to re-purposing for other tasks. Manually monitored video feeds have reduced reliability due to the possibility of human error, and are difficult or impossible to integrate with automated systems.
Systems, apparatuses, processes, and the like disclosed herein overcome various of the above-discussed deficiencies and others.
In some embodiments disclosed herein, a Business Process Management (BPM) system comprises a graphical display device, at least one user input device, and at least one processor programmed to: implement a BPM graphical user interface (GUI) enabling a user to operate the at least one user input device to construct a process model that is displayed by the BPM GUI on the graphical display device, the BPM GUI providing (i) nodes to represent process events, activities, or decision points including computer vision nodes to represent video stream processing and (ii) flow connectors to define operational sequences of nodes and data flow between nodes; implement a BPM engine configured to execute a process model constructed using the BPM GUI to perform a process represented by the process model; and implement a computer vision engine configured to execute a computer vision node of a process model constructed using the BPM GUI by performing video stream processing represented by the computer vision node. The the BPM GUI may display the process model using Business Process Model Notation (BPMN) to represent the nodes and flow connectors and further using computer vision extension notation to represent computer vision nodes. In some embodiments, the BPM GUI provides computer vision nodes including a plurality of video pattern detection nodes for different respective video patterns, and the computer vision engine is configured to execute a video pattern detection node by applying a classifier trained to detect a video pattern corresponding to the video pattern detection node in a video stream that is input to the video pattern detection node via a flow connector. The BPM GUI may further provide computer vision nodes including a plurality of video pattern relation nodes designating different respective video pattern relations, and the computer vision engine is configured to execute a video pattern relation node by determining whether two or more video patterns detected by execution of one or more video pattern detection nodes satisfy the video pattern relation designated by the video pattern relation node.
In some embodiments disclosed herein, a non-transitory storage medium stores instructions readable and executable by an electronic system including a graphical display device, at least one user input device, and at least one processor to perform a method comprising the operations of: (1) providing a graphical user interface (GUI) by which the at least one user input device is used to construct a process model that is displayed on the graphical display device as a graphical representation comprising (i) nodes representing process events, activities, or decision points and including computer vision nodes representing video stream processing and (ii) flow connectors connecting nodes of the process model to define operational sequences of nodes and data flow between nodes of the process model; and (2) executing the process model to perform a process represented by the process model including executing computer vision nodes of the process model by performing video stream processing represented by the computer vision nodes of the process model. In some embodiments, in the operation (1) the process model is displayed as a graphical representation comprising computer vision nodes selected from: (i) a set of video pattern detection nodes defining a video vocabulary of video patterns of persons, objects, and scenes; and (ii) a set of video pattern relation nodes defining a video grammar of geometrical, spatial, temporal, and similarity relations between video patterns detectable by the set of video pattern detection nodes. In such embodiments, the GUI constructs the process model with the computer vision nodes interconnected by flow connectors in compliance with the video grammar defined by the set of video pattern relation nodes.
In some embodiments disclosed herein, a system comprises a non-transitory storage medium as set forth in the immediately preceding paragraph, and a computer including a graphical display device and at least one user input device, the computer operatively connected to read and execute instructions stored on the non-transitory storage medium.
In improvements disclosed herein, a Business Process Management (BPM) system is employed to provide a flexible way to leverage existing or new camera installations to perform diverse computer vision (CV)-based tasks. Conversely, it will be appreciated that various business processes controlled by the BPM system will benefit from CV capability incorporated into the BPM system as disclosed herein.
A Business Process Management (BPM) system is a computer-based system that manages a process including aspects which may cross departmental or other organizational lines, may incorporate information databases maintained by an information technology (IT) department, or so forth. Some BPM systems manage virtual processes such as electronic financial activity; other BPM systems are employed to manage a manufacturing process, inventory maintenance process, or other process that handles physical items or physical material. In the latter applications, the BPM system suitably utilizes process sensors that detect or measure physical quantities such as counting parts passing along an assembly line, measuring inventory, or so forth.
A BPM system is a computer-implemented system that typically includes a graphical BPM modeling component, a BPM executable generation component, and a BPM engine. In a given BPM system implementation, these components may be variously integrated or separated.
The graphical BPM modeling component provides a graphical user interface (GUI) via which a user constructs a model of the business process. A commonly employed BPM graphical representation is Business Process Model [and] Notation (BPMN), in which nodes (called “objects” in BPMN) representing process events, activities, or gateways are connected by flow connectors (called “flow objects” in BPMN). An event is, for example, a catching event which when detected starts a process, or a throwing event generated upon completion of a process. An activity performs some process, task, or work. Gateways are types of decision points. The flow connectors define ordering of operations (i.e. operational sequences), designate message, communication, or data flow, or so forth. As another example, the graphical BPM modeling component may be a custom graphical front end for modeling the business process in the Business Process Execution Language (BPEL). In some implementations, BPMN serves as the graphical front end for generating the BPM model in BPEL. The GUI process graphical representation may optionally include other features such as functional bands (called “swim lanes” in BPMN) grouping nodes by function, executing department, or so forth, or annotations (called “artifacts” in BPMN) that label elements of the BPM model with information such as required data, grouping information, or the like.
The BPM executable generation component converts the graphical BPM model to an executable version that can be read and executed by the BPM engine. Execution of the executable model version by the BPM engine performs the actual process management. It will be appreciated that various BPM system implementations provide varying levels of integration between the graphical BPM modeling component and the BPM executable generation component, and/or between the BPM executable generation component and the BPM engine. For example, the Java-based jBPM open-source engine executes a graphical BPMN model directly. Bonita BPM is an open-source BPM suite which includes a BPMN-compliant GUI and a BPM engine implemented as a Java application programming interface (API). As another example, Stardust Eclipse is another open-source BPM including a BPMN-compliant GUI and a Java-based BPM engine. Many BPM suites are web-based.
The term “Business Process Management” is a term of art reflective of the common use of BPM systems in automating or streamlining manufacturing, inventory, and other processes performed in a commercial setting. It will be appreciated that a BPM system incorporating computer vision as disclosed herein is more generally applicable to any type of process beneficially incorporating or performing computer vision tasks. For example, a city, county, state, or other governmental entity may employ a BPM system with computer vision extensions to perform traffic monitoring or enforcement functionality. As another example, a non-profit environmental advocacy organization may employ a BPM system incorporating computer vision for tasks such as environmental monitoring or automated wildlife monitoring (e.g. raptor nest monitoring). Moreover, the disclosed BPM systems with computer vision extensions may be used to automate or re-purpose new or existing computer vision systems, or may be used to integrate computer vision into other processes.
As disclosed herein, a BPM system can be extended to incorporate computer vision activities performed using video cameras, such as already-available security cameras, inspection cameras for industrial processes, traffic monitoring or enforcement cameras, and so forth. This extension leverages computer vision as a new type of sensor input for process management under a BPM system. However, it will be appreciated that a video camera is far more complex than a typical industrial sensor that provides a discrete value, e.g. a quantity or weight or the like. Leveraging computer vision requires performing video or image processing to derive useful information from the video content. In some embodiments, the BPM system may also manipulate the video camera(s) by operations such as panning, zoom, or the like.
In some disclosed approaches, the BPM system is extended to incorporate computer vision by providing a vocabulary of visual concepts, and a grammar defining interactions of these visual concepts with other visual concepts and/or with other data processed by the BPM system in order to represent complex processes. These building blocks can be combined by composition to construct complex tasks. Advantageously, generic or domain-specific computer vision extension modules such as pedestrian detectors, various object detectors, composition rules (e.g., spatio-temporal relations), and so forth can be re-used, and detectors can be trained using training data across domains. Re-use of generic or domain-specific computer vision extension modules in the BPM system enables computer vision to be integrated with processes managed by BPM, without the need for laborious manual creation of computer vision infrastructure. Disclosed approaches also accommodate the typically high uncertainty associated with video-based observations. While the term computer vision “extension” modules is used herein to reflect an implementation in which an existing BPM system is extended (or retrofitted) to incorporate computer vision capability, it will be appreciated that the disclosed computer vision extension modules may be included in the BPM system as originally constructed.
With reference to
The BPM GUI 20 enables a user to operate the at least one user input device 12, 14 to construct a process model that is displayed by the BPM GUI 20 on the graphical display device 10. The BPM GUI 20 provides (i) nodes to represent process events, activities, or decision points including computer vision nodes to represent video stream processing and (ii) flow connectors to define operational sequences of nodes and data flow between nodes. In the illustrative example, the BPM GUI 20 displays the process model using Business Process Model Notation (BPMN) to represent the nodes and flow connectors, and further uses computer vision (CV) extension notation implemented by CV extensions 30 to represent computer vision nodes.
Depending upon the architecture of the specific BPM suite, the process model may be directly executed by the BPM engine or, as in the illustrative example shown in
The BPM engine 22 is configured (e.g. the server 18 is programmed) to execute the process model constructed using the BPM GUI 20 (and optionally after format conversion or compilation, e.g. by the generation component 24) to perform the process represented by the process model. The BPM suite 20, 22, 24 may be a conventional BPM suite with suitable modifications as disclosed herein to execute CV functionality. By way of illustrative example, the BPM suite 20, 22, 24 may be an open-source BPM suite such as jBPM, Bonita BPM, or Stardust Eclipse, or a variant (e.g. fork) of one of these BPM suites. If appropriate for executing the process model, the BPM engine 22 may access resources such as various electronic database(s) 34, for example corporate information technology databases storing information on product inventory, sales information, or so forth. If the process managed in accordance with the process model manages a manufacturing process, inventory maintenance process, or other process that handles physical items or physical material, the BPM engine 22 may access various process-specific inputs such as automated sensor(s) 36 (e.g. an assembly line parts counter) or process-specific user input device(s) 38 (e.g. user controls or inputs of a process control computer or other electronic process controller). The interactions between the BPM engine 22 and these various ancillary resources 34, 36, 38 is suitably performed in accordance with existing BPM engine technology, for example as provided in jBPM, Bonita BPM, or Stardust Eclipse BMP suites.
To process computer vision (CV) nodes of the process model, a CV engine 40 is configured (e.g. the server 18 is programmed) to execute a computer vision node of a process model constructed using the BPM GUI 20 by performing video stream processing represented by the computer vision node. The illustrative CV engine 40 is implemented as computer vision extension modules of the BPM engine 22. In other embodiments, the CV engine may be a separate component from the BPM engine that communicates with the BPM engine via function calls or the like. The CV engine 40 operates on video stream(s) generated by one or more deployed video camera(s) 42.
With reference to
In an operation 52, the process model is constructed using the BPM GUI 20. The process modeling operation 52 may be performed by an IT specialist or, due to the intuitive graphical nature of BPMN or other graphical representational graphical user interfaces, may be performed by a non-specialist, such as an assembly line engineer trained in the manufacturing process being modeled but not having substantial specialized BPM training. Various combinations may be employed—for example, the initial process model may be constructed by an IT specialist with BPM training, in consultation with assembly line engineers, and thereafter routine updating of the process model may be performed directly by an assembly line engineer. In constructing the process model, the CV extensions 30 are used as disclosed herein to implement computer vision functions such as detecting patterns and pattern relationships and recognizing more complex events composed of patterns and pattern relationships.
In an operation 54, the constructed process model is converted by the BPM design language generation component 24 into an executable version, including using the CV extensions 32 to convert the CV nodes of the process model. For example, the operation 54 may convert the graphical BPMN process model into an executable BPEL version. It will again be appreciated that in some BPM suite architectures the operation 54 may be omitted as the BPM engine directly executes the graphical process model.
In an operation 56, the process model is executed by the BPM engine 22, with the CV extension modules 40 (or other CV engine) executing any CV nodes of the process model by performing the video stream processing represented by the CV nodes.
The CV extensions disclosed herein provide a high degree of flexibility in constructing a CV process (or sub-process of an overall process model) by leveraging the BPM process modeling approach in which nodes represent process events, activities, or decision points, and flow connectors define operational sequences of nodes and data flow between nodes. In disclosed illustrative embodiments, BPM nodes representing events analogize to a set of video pattern detection nodes defining a video vocabulary of video patterns of persons, objects, and scenes. Likewise, BPM nodes representing activities analogize to a set of video pattern relation nodes defining a video grammar of geometrical, spatial, temporal, and similarity relations between the various video patterns that are detectable by the set of video pattern detection nodes. BPM decision nodes (e.g. BPMN gateways) can be used analogously as in conventional BPM, but operating on outputs of the CV nodes. By thusly breaking the computer vision process down into constituent building blocks, the existing BPM GUI 20 is leveraged (by adding the CV extensions 30) to enable construction of CV processes or sub-processes. Re-use of the CV building blocks (i.e. re-use of the CV nodes) is readily facilitated. In general, video patterns of various types may be detected, such as video patterns of persons, objects, and scenes. Similarly, various geometrical, spatial, temporal, and similarity relations between video patterns may be recognized. For example, the rotation of an object may be recognized by the operations of (1) detecting the object at two successive times in the video stream using a video pattern detector trained to detect the object and (2) recognizing the second-detected instance of the object is a rotated version of the first-detected instance of the object. In another example, compliance or non-compliance with a traffic light can be performed by operations including (1) detecting a vehicle over successive time instances spanning the duration of a detected red traffic light using appropriate pattern detectors and temporal pattern relations, (2) determining the spatial relationship of the vehicle to the red traffic light over the duration of the red light using appropriate spatial pattern detectors, and (3) at a BPM gateway, deciding whether the vehicle obeyed the red traffic light by stopping at the red light based on the determined spatial relationships.
With continuing reference to
While the video pattern detection nodes typically comprise trained CV detectors, the extension modules 66 (or other components of the CV engine 40) implementing other CV node types, such as video pattern relation nodes, video stream acquisition nodes, video camera control nodes, or so forth, typically do not incorporate a trained classifier, but rather may be programmed based on mathematical relations (geometrical rotation or translation relations, spatial relations such as “above”, “below”, temporal relations such as “before” or “after”, or similarity relations comparing pre-determined pattern features), known video camera control inputs, or so forth.
In the following, some illustrative examples are presented of some suitable implementations of a system of the type described with reference to
The illustrative system includes the BPM GUI 20, referred to in this example as a Vision Enabled Process Environment (VEPE), which includes specific CV language extensions 30 and BPM-type modelling support for bringing CV capabilities into the process modeling. The illustrative example includes the generation component (GEM) 24 that takes process models (or designs) created in the VEPE 20 and creates plain executable business process design models in a language understood by the BPM engine 22. The illustrative example employs a BPM suite using BPMN 2.0 to represent the nodes and flow connectors, and which includes CV extensions 32 added to the plain language elements using extensibility mechanisms of the standard BPMN 2 notation which provides extension points for its elements. The BPM engine 22, denoted in this example by the acronym “BPME”, interfaces with the CV engine 40 at runtime when the process model executes. The CV engine (CVE) 40 of this example uses a modular approach that employs expressivity and modularity for interfacing CV operations with BPM elements. The CVE 40 may be implemented as an application program interface (API) that the BPME 22 uses to leverage the CV capabilities.
With reference to
The VEPE graphical modelling environment 20 can be implemented as a stand-alone process editor such as the open source Eclipse BPMN 2.0 Modeler, or incorporated into an existing BPM suite. If VEPE is a stand-alone editor, it should have enough process design functionality to cover the structural elements of normal process design and the CV nodes. In the stand-alone VEPE approach, most of the business functionality that is not CV-centric is enriched in a standard BPM editor at a later stage, after the GEM generation of the BPMN. On the other hand, if VEPE is designed as an extra layer on top of an existing BPM GUI, the CV extensions 30 provide the specific support for designing the CV processes in the form of additional dedicated tool palettes containing the CV elements, property sheets and configuration panels for the specification of the various parameters used by the CV elements, as well as any other graphical support to highlight and differentiate CV nodes from standard BPM elements. Additionally, a specific decorator for CV could be enabled (such as the illustrative camera icon 71 shown in
The language extensions 32 support the definition of process models that can take advantage of CV capabilities. The GEM 24 uses the extensions 32 in the generation phase. The CV language extensions 32 may comprise definitions of new elements or extending and customizing existing elements. Both approaches can be implemented in typical open source BPM suites, and some BPM languages are built with extensions in mind. For example, BPMN 2.0 has basic extension capabilities that allow the enrichment of standard elements with a variety of options. Where such extension capabilities do not suffice, new elements can be introduced. Both the additional elements and the extensions to the existing elements need to be supported by the BPME 22 by way of the CV extension modules 40 which execute the CV nodes of the generated process model.
The BPM engine 22 is suitably implemented as an application server that can interface with a variety of enterprise applications such as Enterprise Resource Planning (ERP) and/or Customer Relationship Management (CRM) directories, various corporate databases (e.g. inventory and/or stock databases, etc) and that can orchestrate and control workflows using high-performance platforms supporting monitoring, long-term persistence and other typical enterprise functionality. The CV systems and methods are implemented using the CV extensions 30, 32 and modules 40 disclosed herein, and additionally can advantageously leverage existing BPM suite capabilities through the disclosed CV extensions 30, 32 and CV engine/extension modules 40, for example using call-back triggering functionality. For open source BPM engines such as those of the Stardust, Bonita, and jBPM suites, the CV extension modules 40 are straightforward to implement due to the availability of the open source BPM code. For proprietary BPM suites, the CV extension models 40 can be added through specific APIs or another Software Development Kit (SDK) provided by the BPM suite vendor, for example leveraging a BPMN Service Tasks framework.
Adding the CV extension modules or other CV engine 40 to an existing BPM engine 22 entails adding connectivity to the CV Engine 40, for example using APIs of the CV engine 40 in order to provide the CV functionality specified in the language extensions 30, 32. Some CV language extensions may be natively supported, so that they are first class elements of the BPM engine 22 (for extensions that the GEM 24 cannot map to standard BPMN). Other CV language extensions may be implemented via an intermediate transformation operation in which the process model (e.g. process model 70 of
The Computer Vision (CV) Engine 40 provides the execution support for CV-enabled process models. In one illustrative embodiment, its functionality is divided in three (sub-)components: (1) native support for the Video Domain-Specific Language (VDSL) by allowing the specification of composite actions, patterns and associated queries using the elements specified in the VDSL; (2) call-back functionality for triggering actions in a process model when certain events are detected; and (3) a component allowing the specification of rules in the observed video scenes (to be then used for detecting conformance, compliance, and raising alerts).
A challenge in integrating CV capabilities in BPM relates to the handling of the inherent uncertainty that CV algorithms entail. This is a consequence of the complexity of a video data stream as compared with other types of typical inputs such as a manufacturing line sensor which typically produces a discrete output (e.g. parts count). In one approach, the detection of an event in a video stream is assigned an associated confidence level. This may be done, for example, based on the output of a “soft” CV classifier or regressor that outputs a value in the range (for example) of [0,1] with 0 indicating lowest likelihood of a match (i.e. no match) and 1 indicating highest likelihood of a match. In this case, a match may be reported by the CV classifier if the output is above a chosen threshold—and the match is also assigned a confidence level based on how close the classifier or regressor output is to 1.
In addition to constructing the CV classifier or regressor to provide an indication of the confidence level, the process model is suitably constructed to process a match based in part on this confidence level. For instance, in one approach if the CV classifier or regressor indicates a 99% confidence level that a certain event was detected, the process designer may consider that the risks that this is wrong at this high confidence level are minimal and therefore the process can assume the event was indeed detected. On the other hand, for a lower confidence value (say, 80%), the process designer may choose to add process logic in order to deal with the lower level of confidence in the event detection, for instance by executing additional tasks such as involving a human supervisor to double-check the data. In one approach, the process logic to deal with uncertainty is automatically added as part of the generation phase performed by the GEM 24 using the extensions 32, for any uncertainty-prone CV task. As such the process designer operating the BPM GUI 20 does not need to modify the gateway element to specify the confidence level, but rather specifies the confidence level directly onto the CV elements in the process model 70. These values are automatically transported at the generation phase into the generated BPMN model 80 that contains the gateway and compensation logic, configuring the gateway values automatically. This is illustrated in
In the disclosed approaches, the process designer does not need to program any connection between the process model implemented via the BPM suite and the CV engine 40. Rather, the process designer selects to use CV-enabled elements (e.g. nodes) in the process model, the connections to appropriate CV processing are made automatically by the CV extension modules 40, e.g. via CV engine APIs, BPMN Service Tasks prefilled with web service information, or the like. The CV engine 40 is modular. Various video patterns (e.g. persons, objects, or scenes) are individually described by corresponding video pattern detectors which are represented in the process model by video pattern detection nodes. Relationships (spatial, temporal, geometric transformational, similarity) between detected video patterns are recognized by video pattern relation nodes, which form a video grammar for expressing CV tasks in terms of a video vocabulary comprising the detectable video patterns. In this way, CV tasks can be composed on-the-fly for any process model. Composition mechanisms accessible through CV engine APIs are automatically employed by adding CV nodes to the process model. The modularity of this approach allows for the reuse and combination of any number of video patterns to model arbitrary events. The disclosed approach reduces CV task generation to the operations of selecting CV elements represented as CV nodes and designating CV node parameters and/or properties.
With reference now to
The table shown in
With reference back to
With reference now to
The “relations” category 120 of
The “patterns” category 122 of the illustrative VDSL formalism of
Video patterns are data-driven concepts, and accordingly the video detector nodes comprise empirical video pattern detectors (e.g. video classifiers or regressors) that are trained on video examples that are labeled as to whether they include the video pattern to be detected. Some patterns are generic (e.g., persons, vehicles, colors, atomic motions . . . ), and can therefore be pre-trained using a generic training set of video stream segments in order to allow for immediate re-use across a variety of domains. However, using generically trained detectors may lead to excessive uncertainty. Accordingly, for greater accuracy in video pattern detector may be trained using a training set of domain-specific video samples, again labeled as to whether they contain the (domain-specific) pattern to be detected. The number of training examples may be fairly low in practice, depending on the specificity of the pattern (e.g., as low as one for near-duplicate detection via template matching, for example in a recognition task for facial recognition of a person, or license plate matching to identify an authorized vehicle). In some implementations, the video pattern detector may be trained on labeled examples augmented by weakly labeled or unlabeled data (e.g., when using semi-supervised learning approaches).
Events 124 formally represent high-level video phenomena that the CV engine 40 is asked to observe and report about according to a query from the BPM engine 22. In contrast to video patterns 122, models of events cannot be cost-effectively learned from user-provided examples. Events are defined at runtime by constructing an expression using the video grammar which combines video patterns 122 detected by empirical video pattern classifiers or regressors using video pattern relations 120. This expression (event) responds to a particular query from the BPM engine 22. Both specificity and complexity of queried events is accommodated by composition of empirically detected video patterns 122 using the grammatical operators, i.e. video pattern relations 120.
In the following, an illustrative API-based approach is described via which the BPM engine 22 formulates queries to the CV engine 40. These queries to the CV engine 40 are made during runtime execution of a process model that includes CV nodes connected by flow connectors which represent CV tasks expressed in the VDSL video grammar just described.
A Context object holds a specific configuration of the set of video cameras 42 and their related information. The Context object also incorporates constraints via a set of constraints from the BPM engine 22 (e.g., to interact with signals from other parts of the process model). The API allows filtering of the video streams processed (for instance to limit the video streams that are processed to those generated by video cameras in certain locations). This can be expressed by the following API queries:
Context={CameraInfo[ ] cis, BPConstraints[ ] bpls}
Context getContext(CameraFilter[ ] cfs, BPConstraints[ ] bpcs)
Pattern objects are entities comprising detectable visual patterns. They are accessible via the following API queries:
PatternType=Enum{Action, Object, Attribute, Scene}
Pattern={PatternType pt, SpatioTemporalExtent ste, Confidence c}
Pattern[ ] getPatterns(Context ctx, PatternFilter[ ] pfs)
The detectable video patterns (actions, objects, attributes, and scenes) are those video patterns for which the CVE 40 has a pre-trained video pattern detector. These video pattern detectors may optionally be coupled in practice (multi-label detectors) in order to mutualize the cost of search for related video patterns (e.g., objects that rely on the same underlying visual features). Pattern filter and context arguments allow searches for patterns verifying certain conditions.
Relations describe the interaction between two patterns:
RelationType=Enum{Geometry, Space, Time, Similarity}
Relation={RelationType rt, RelationParameter[ ] rps, Confidence c}
Relation[ ] getRelations(Pattern p1, Pattern p2)
The Geometry, Space, Time, and Similarity relation types correspond respectively to a list of predetermined geometrical transformations (e.g., translation, rotation, affine, homography), spatial relations (above, below, next to, left to . . . ), temporal relations (as defined, for example, in Allen's temporal logic), and visual similarities (e.g., according to different pre-determined features). The video pattern relations are defined a priori with fixed parametric forms. Their parameters can be estimated directly from the information of two patterns input to the video pattern relation node.
Events enable hierarchical composition of patterns and relations in accordance with the video grammar in order to create arbitrarily complex models of video phenomena, such as groups of patterns with complex time-evolving structures. Events are internally represented as directed multi-graphs where nodes are video patterns and directed edges are video pattern relations. Two nodes can have multiple edges, for instance both a temporal and a spatial relation. Events are initialized from a context (suitably detected using a video pattern detection node trained to detect the context pattern), and are built incrementally by adding video pattern relations between detected video patterns. Some illustrative API queries are as follows:
(these API calls add two pattern nodes with specified IDs and relations to the event)
CallbackStatus startEventMonitor(Event e)
(instructs the CVE 40 to send notifications each time an event happens)
stopEventMonitor(Event e)
In the following, examples are presented of illustrative API calls that might be generated by the illustrative embodiment of the GEM 24, and handled as per the process execution by both the BPM engine 22 and the CV engine 40. These API calls are listed sequentially below, but, in practice they would be part of complex process model interactions, and may be interleaved with other process elements.
The following example describes a CV task performing illegal U-turn (“iut”) enforcement. In this transportation example, the goal is to monitor intersections of a camera network for illegal U-turns.
In this example, the “car” video pattern detector is applied in two successive intervals with a temporal relation of being spaced apart by one second (parameter “1s”) and having the similarity relation of being “identical” (where “identical” is defined by some suitably close correspondence of features; the similarity relation may also apply a spatial registration operation to spatially register the two car video patterns prior to comparing the features). The first geometry relation (“translation”) determines that the car is in motion, while the second geometry relation (“rotation”, operating on second and a third detected car video pattern again found to be identical by the similarity relation) determines the car has undergone a 180° rotation, i.e. has made a U-turn. Because this processing is applied only to cameras operating where there are U-turn restrictions (as per the first line, Context ctx= . . . ), it follows that this detected 180° turn is an illegal U-turn. This triggers the startMonitorEvent(iut) which may be represented in the process diagram by a CV gateway, and the startMonitorEvent(iut) may for example be a second CV event that captures the license plate of the vehicle, which may then flow into non-CV BPMN logic that accesses a license plates database (e.g., one of the databases 34 of
The following further example describes a CV task performing red light enforcement. In this example, the goal is to monitor traffic lights for red light enforcement (i.e., to detect vehicles that illegally go through a red light).
As already mentioned, the disclosed agile computer vision task development and execution platform may employ a commercial and/or open-source BPM suite including a BPM GUI 20 with CV extensions 30, a BPM engine 22 in operative communication, e.g. via API or the like, with a CV engine 40, and optional intermediary BPM design language generation component(s) 24 with CV extensions 32. It will be further appreciated that the disclosed agile development and execution platform for developing and executing processes that include CV functionality may be embodied by a non-transitory storage medium storing instructions readable and executable by an electronic system (e.g. server 18 and/or computer 16) including a graphical display device 10, at least one user input device 12, 14, and at least one processor to perform the disclosed development and execution of processes that include CV functionality. The non-transitory storage medium may, for example, include a hard disk drive, RAID, or other magnetic storage medium; an optical disk or other optical storage medium; solid state disk drive, flash thumb drive or other electronic storage medium; or so forth.
It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.