The present disclosure pertains generally to video analytics and more particularly to methods and systems for improving video analytic results.
A security system may include a number of video cameras within a monitored area. The monitored area may be indoors or outdoors, for example. Each of the video cameras has a field of view (FOV) and is configured to capture a video stream of that video camera's FOV. A variety of video analytics algorithms are known for analyzing video streams and identifying objects or events of interest in the video streams. The video analytics algorithms, or at least some of them, may perform better under certain operating conditions. What would be desirable are methods and systems for improving video analytic results by aligning the video analytics algorithms with current operating conditions.
This disclosure relates generally to methods and systems for improving video analytic results by aligning video analytics algorithms with current operating conditions. An example may be found in a method of improving performance of a video analytics algorithm, the video analytics algorithm configured to receive and analyze a video stream captured by a video camera, wherein the video stream has one or more video parameters. The illustrative method includes storing a set of desired video parameters for achieving a desired accuracy level for the video analytics algorithm, and identifying one or more of the video parameters of the video stream. One or more of the video parameters of the video stream are then compared with a corresponding one of the desired video parameters of the set of desired video parameters to ascertain whether one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters of the set of desired video parameters by at least a threshold amount. When one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters of the set of desired video parameters by at least the threshold amount, one or more of the video parameters of the video stream are adjusted toward the corresponding one of the desired video parameters of the set of desired video parameters to increase the accuracy level of the video analytics algorithm.
Another example may be found in a system for improving video analytics of a video stream captured by a video camera. The illustrative system includes a memory for storing a plurality of video analytics algorithms, each configured to identify a common (e.g. the same) event type in the video stream, and one or more processors that are operatively coupled to the memory. The one or more processors are configured to identify one or more video parameters of the video stream, and based on the one or more identified video parameters of the video stream, select a selected one of the plurality of video analytics algorithms that is best suited to identify the common event type in the video stream. The one or more processors are configured to process the video stream using the selected one of the plurality of video analytics algorithms to identify the common event type in the video stream.
Another example may be found in a method of improving video analytics of a video stream captured by a video camera. The illustrative method includes storing a plurality of video analytics algorithms, each configured to identify a common event type in the video stream, and analyzing the video stream to identify one or more objects in the video stream. Based on the one or more objects identified in the video stream, a selected one of the plurality of video analytics algorithms that is best suited to identify the common event type in the video stream is selected. The video stream is processed using the selected one of the plurality of video analytics algorithms to identify the common event type in the video stream.
The preceding summary is provided to facilitate an understanding of some of the features of the present disclosure and is not intended to be a full description. A full appreciation of the disclosure can be gained by taking the entire specification, claims, drawings, and abstract as a whole.
The disclosure may be more completely understood in consideration of the following description of various illustrative embodiments of the disclosure in connection with the accompanying drawings, in which:
While the disclosure is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit aspects of the disclosure to the particular illustrative embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure.
The following description should be read with reference to the drawings wherein like reference numerals indicate like elements. The drawings, which are not necessarily to scale, are not intended to limit the scope of the disclosure. In some of the figures, elements not believed necessary to an understanding of relationships among illustrated components may have been omitted for clarity.
All numbers are herein assumed to be modified by the term “about”, unless the content clearly dictates otherwise. The recitation of numerical ranges by endpoints includes all numbers subsumed within that range (e.g., 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.80, 4, and 5).
As used in this specification and the appended claims, the singular forms “a”, “an”, and “the” include the plural referents unless the content clearly dictates otherwise. As used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
It is noted that references in the specification to “an embodiment”, “some embodiments”, “other embodiments”, etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is contemplated that the feature, structure, or characteristic may be applied to other embodiments whether or not explicitly described unless clearly stated to the contrary.
In the example shown, each of the video cameras 12 is operably coupled with a controller 16. The video cameras 12 may communicate with the controller 16 via any wired or wireless protocol. The controller 16 may, for example, control operation of at least some of the video cameras 12. The controller 16 may receive video streams from one or more of the video cameras 12. In some instances, the controller 16 may be configured to perform video analytics on one or more video streams being provided to the controller 16 by one or more of the video cameras 12.
In some cases, the controller 16 may be operably coupled with an edge gateway 18, such as via a facility network (not expressly shown). The edge gateway 18 may be configured to communicate with a cloud-based server 20. The cloud-based server 20 may function to monitor performance of the security system 10, for example. In some instances, at least some video analytics algorithms may be executed by the cloud-based server 20, particularly in cases where particular video analytics algorithms may require more processing power than may be available within the processors 14, the controller 16 and/or the edge gateway 18.
In some cases, the security system 10 may include a monitoring station 22. The monitoring station 22 may be operably coupled with the controller 16, and/or indirectly coupled with the cloud-based server 20 via the edge gateway 18. The monitoring station 22 may allow security personnel to monitor and/or review video streams from one or more of the video cameras 12, or perhaps to review video clips that show events of possible interest, where the video clips are portions of video streams, and may be provided by one or more of the video cameras 12, for example. The monitoring station 22 may permit security personnel to raise alarms and alerts, and possibly notify authorities such as but not limited to police and fire, depending on what is detected and seen within the video clips and/or video streams.
The illustrative system 23 includes a memory 24 that is configured to store a plurality of video analytics algorithms 26, individually labeled as 26a, 26b and through 26n. It should be noted that there is no correlation between the value of “n” in this case, referring to the number of video analytics algorithms 26, and the value of “n” in referring to the number of video cameras 12. In some cases, two or more of the video analytics algorithms 26 may be configured to identifying a common (e.g. the same) event type in a video stream. The system 23 includes one or more processors 28 that are operatively coupled to the memory 24.
The one or more processors 28 may be configured to identify one or more video parameters of the video stream. Based on the one or more identified video parameters of the video stream, the one or more processors 28 may be select one of the plurality of video analytics algorithms that is best suited to identify the common event type in the video stream. The one or more processors 28 may be configured to process the video stream using the selected one of the plurality of video analytics algorithms to identify the common event type in the video stream. As an example, the common event type may include one or more of a facial recognition event, a mask detection event, a person count event, a license plate detection event, a vehicle detection event, an unattended bag detection event, a shoplifting detection event, a crowd detection event, a person fall detection event, and a jaywalking detection event.
In some instances, the one or more processors 28 may be configured to, for each of the plurality of video analytics algorithms 26, store a set of desired video parameters for achieving a desired accuracy level for the respective video analytics algorithm 26. In some cases, the set of desired video parameters may include two or more of a desired minimum frame per second (FPS) parameter, a desired minimum frame resolution parameter, a desired minimum bit rate parameter, a desired video camera placement parameter, a desired video camera setting parameter, and a desired scene lighting parameter. The desired video camera placement parameter may include a camera mounting height parameter. The desired video camera setting parameter may include one or more of a camera focus parameter, a camera zoom parameter, a camera tilt parameter and a camera pan parameter.
The one or more processors 28 may be configured to compare one or more of the video parameters of the video stream with the corresponding ones of the desired video parameters of the set of desired video parameters for each of the plurality of video analytics algorithms. In some instances, the one or more processors 28 may be configured to identify which of the set of desired video parameters of the plurality of video analytics algorithms best match the one or more of the video parameters of the video stream, to identify the video analytics algorithm that is best suited to identify the common event type in the video stream.
In an example, the common event type may be a facial recognition event. There may be more than one facial recognition video analytics algorithm available for detecting facial recognition events. A first facial recognition video analytics algorithm may be more accurate under low lighting conditions than a second facial recognition video analytics algorithm. When one or more of the video parameters of the video stream indicate low lighting conditions, the first facial recognition video analytics algorithm, which is more accurate under low lighting conditions, may be selected for use. When one or more of the video parameters of the video stream indicate high lighting conditions, the second facial recognition video analytics algorithm, which may be more accurate under high lighting conditions, may be selected for use. In some cases, the first facial recognition video analytics algorithm may be automatically selected and used. In other cases, the first facial recognition video analytics algorithm may be recommended to an operator, and the operator may authorize the use of the first facial recognition video analytics algorithm.
In another example, the common event type may be a people count event. There may be more than one people count video analytics algorithm available for counting people in a FOV of the video camera. A first people count video analytics algorithm may be more accurate when the person density is less than a threshold, and a second people count video analytics algorithm may be more accurate when the person density is above the threshold. When one or more of the video parameters (person density) of the video stream indicate a person density below the threshold, the first people count video analytics algorithm may be selected for use. When one or more of the video parameters (person density) of the video stream indicate a person density above the threshold, the second people count video analytics algorithm may be selected for use.
In some cases, the system may monitor one or more of the video parameters of the video stream and dynamically select a video analytics algorithm that best matches the one or more of the video parameters of the video stream to achieve improved video analytic results (e.g. better accuracy). In some cases, the video frames of the video stream may be partitioned into a plurality of partitions. The system may monitor one or more of the video parameters of each partition of the video stream and dynamically select a video analytics algorithm for each partition that best matches the one or more of the video parameters of that partition to achieve improved video analytic results in that partition. In some cases, the partitions may be dynamically defined based on the one or more of the video parameters of the video stream. For example, the one or more of the video parameters may identify a region that has low lighting conditions and another region that a high lighting conditions. The system may dynamically define a first partition around the low lighting conditions and a second partition around the high lighting conditions. The system may then select a first video analytics algorithm for use in the first partition and a second different video analytics algorithm for use in the second partition.
The illustrative method 30 includes storing a set of desired video parameters for achieving a desired accuracy level for the video analytics algorithm, as identified at block 32. The set of desired video parameters may, for example, include two or more of a desired minimum frame per second (FPS) parameter, a desired minimum frame resolution parameter, a desired minimum bit rate parameter, a desired video camera placement parameter, a desired video camera setting parameter, and a desired scene lighting parameter. The desired video camera placement parameter may include a camera mounting height parameter. The desired video camera setting parameter may include one or more of a camera focus parameter, a camera zoom parameter, a camera tilt parameter and a camera pan parameter.
One or more of the video parameters of the video stream are identified, as indicated at block 34. One or more of the video parameters of the video stream are compared with a corresponding one of the desired video parameters of the set of desired video parameters to ascertain whether one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters of the set of desired video parameters by at least a threshold amount, as indicated at block 36. When one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters, the system (or an operator) may adjust one or more of the video parameters of the video stream (e.g. camera settings) toward the corresponding one of the desired video parameters to increase the accuracy level of the video analytics algorithm, as indicated at block 38.
For example, one of the desired video parameters for a video analytics algorithm may include a minimum FPS of 30 and a minimum bit rate of 10 Mbps. When one or more of the video parameters of the video stream diverge from these desired video parameters, the system (or an operator) may adjust the minimum FPS and/or the bit rate of the corresponding video stream (e.g. camera settings) to increase the accuracy level of the video analytics algorithm.
The illustrative method 42 includes storing for each of a plurality of video analytics algorithms a corresponding set of desired video parameters for achieving a desired accuracy level for the respective video analytics algorithm, as identified at block 44. The set of desired video parameters may include, for example, two or more of a desired minimum frame per second (FPS) parameter, a desired minimum frame resolution parameter, a desired minimum bit rate parameter, a desired video camera placement parameter, a desired video camera setting parameter, and a desired scene lighting parameter. The desired video camera placement parameter may include a camera mounting height parameter. The desired video camera setting parameter may include one or more of a camera focus parameter, a camera zoom parameter, a camera tilt parameter and a camera pan parameter.
One or more of the video parameters of the video stream are identified, as indicated at block 44. For each of a plurality of video analytics algorithms, one or more of the video parameters of the video stream are compared with the corresponding one of the desired video parameters of the respective set of desired video parameters for the respective one of the plurality of video analytics algorithms to ascertain whether one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters of the respective set of desired video parameters for the respective one of the plurality of video analytics algorithms by at least a corresponding threshold amount, as indicated at block 46. When one or more of the video parameters of the video stream diverge from the corresponding one of the desired video parameters of the respective set of desired video parameters for the respective one of the plurality of video analytics algorithms by at least the corresponding threshold amount, one or more of the video parameters of the video stream are adjusted toward the corresponding one of the desired video parameters of the respective set of desired video parameters for the respective one of the plurality of video analytics algorithms, as indicated at block 48.
In some instances, the method 40 may include adjusting one or more of the video parameters of the video stream to satisfy the desired accuracy level for each of two or more of the plurality of video analytics algorithms, if possible. In some cases, a first one of the two or more of the plurality of video analytics algorithms may have a higher priority than a second one of the two or more of the plurality of video analytics algorithms, and adjusting one or more of the video parameters of the video stream may include adjusting one or more of the video parameters of the video stream to achieve a higher accuracy level for the first one of the two or more of the plurality of video analytics algorithms (higher priority video analytics algorithm) relative to an accuracy level for the second one of the two or more of the plurality of video analytics algorithms (lower priority video analytics algorithm).
Based on the one or more objects identified in the video stream, a selected one of the plurality of video analytics algorithms that is best suited to identify the common event type in the video stream is selected, as indicated at block 56. The video stream is processed using the selected one of the plurality of video analytics algorithms to identify the common event type in the video stream, as indicated at block 58. In some cases, the method 50 may include processing the video stream and analyzing regions within the video stream that include people in order to determine whether a detection-based approach or a density-based approach is optimal for identifying the common event type. In some cases, steps 54, 56 and 58 may be repeatedly executed.
In one example, the common event type may be a people count event or a people crowd event. There may be more than one people count video analytics algorithm available for counting people in a FOV of the video camera. A first people count video analytics algorithm may be more accurate when the person density is less than a threshold, and a second people count video analytics algorithm may be more accurate when the person density is above the threshold. The video stream may be analyzed to identify persons (objects) in the video stream, as indicated at block 54. A person density may then be determined. When the person density below the threshold, a first people count video analytics algorithm may be selected for use. When the person density is above the threshold, a second people count video analytics algorithm may be selected for use.
Based on the one or more objects identified in each of the plurality of image regions of the one or more video frames of the video stream, a selected one of the plurality of video analytics algorithms that is best suited to identify the common event type in each of the respective one of the plurality of image regions of the one or more video frames of the video stream is selected, as indicated at block 68. The video stream is processed using the selected one of the plurality of video analytics algorithms in each of the respective one of the plurality of image regions of the one or more video frames of the video stream to identify the common event type in each of the respective one of the plurality of image regions of the one or more video frames of the video stream, as indicated at block 70. As an example, the common event type may include a people count event, wherein a first one of the plurality of video analytics algorithms may be selected when one or more individual persons are identified in the video stream, and a second one of the plurality of video analytics algorithms may be selected when a crowd of people is identified in the video stream.
In one example, a variety of different video analytics algorithms may be used for detecting crowds and accurately estimating the number of people in a crowd. Some algorithms may be better suited to detecting and tracking smaller groups of people while other algorithms may be better suited to detecting and tracking larger groups of people. Some algorithms may be better suited to estimating crowd size for large crowds, for example.
Returning to block 126, if no detections were detected, control passes to block 134, and edge detections of moving objects are found. If significant edges are found, as indicated at decision block 136, control passes to block 138 and an estimation and/or density-based model (e.g. estimation and/or density-based people count video analytics algorithm) is selected to run on the portion of the image frame that is above the line. If no significant edges are found at the decision block 136, control passes to block 140 and detection-based methods may be selected.
In some cases, the system may recommend that additional video analytic algorithms may be useful based on what is detected within a particular image. For example, not all video analytic algorithms may be currently used, even though some unused video analytic algorithms would provide value to the user. The system may recommend an appropriate video analytics algorithm based on what is detected within a particular image. For example, if a particular camera is fixed or PTZ, and a single object is detected in its field of view, a recommendation system may be made to run one or more of a Window Break/Glass break detection algorithm, a Loitering and/or Motion detection algorithm, a Door Open Close analytics algorithm, and an Asset Protector Analytics algorithm. As another example, if multiple objects are detected in the field of view, a recommendation system may recommend running all of a Window Break/Glass break detection algorithm, a Loitering and Motion detection algorithm, a Door Open Close analytics algorithm and an Asset protector analytics algorithm. It will be appreciated that each partial view or zone within an image frame may have its own recommendation.
The object detection and the recommendation system may run on an edge camera, for example, a NVR (network video recorder), and/or a VMS (video management system). The object detection and the recommendation system may run in a cloud-based server, for example. In some cases, the output of the recommendation system may be used to enable existing video analytics algorithms that are currently available for use, or to prompt the user to get a licensed to the recommended video analytics algorithms. In some cases, the recommended video analytics algorithms may be purchased from allowed marketplaces. This can be extended to fire domain with electrical distribution system object detected to enable fire detection logic in the view of a camera. In some cases, this may be extended to places/location analytics such as when a retail environment is detected, the recommendation system may recommend running Retail analytics algorithms such as Loitering and Shoplifting analytic algorithms.
For example, in some cases, if a detected object in the image is an electrical distribution point (e.g. fuse box), the system may recommend running a fire detection algorithm to detect fires that might originate in the fuse box. If the detected object is a locker, safe or vault, the system may recommend running an asset protection algorithm. If a detected object is a commercial mall, the system may recommend running a retail analytics algorithm such as but not limited to people heatmap algorithms, consumer interest/intelligence algorithms, mask detection algorithms, social distance algorithms, crowd detection algorithms, shelf replenishment algorithms, checkout free stores algorithms and/or dwell analytics algorithms. If the detected object is an ATM machine, the system may recommend running one or more asset protection algorithms.
If the detected object is a door, the system may recommend running an entry detection algorithm, an exit detection algorithm, a motion detection algorithm or a face detection algorithm. If the detected object is a lift (elevator), the system may recommend running an entry detection algorithm, an exit detection algorithm, a motion detection algorithm or a face detection algorithm. If the detected object is a parking area, the system may recommend running a license plate recognition algorithm, an empty parking spot detection algorithm, collision detection algorithm or a stopped vehicle detection algorithm. If the detected object is a window, the system may recommend running a perimeter motion detection algorithm.
If the detected object is an airport, the system may recommend running one or more of a loitering detection algorithm, a face detection algorithm, an unattended baggage algorithm or a baggage algorithm. If the detected object is a road and traffic lights, the system may recommend running any of a suite of traffic violation detection algorithms. If the detected object is an industrial site or a construction site, the system may recommend running a PPE (personal protective equipment) detection algorithm or a hard hat detection algorithm.
If the detected object is a POS (point of sale) terminal, a cash counter or a ticketing window, the system may recommend running a queue management algorithm. If the detected object is a fence, the system may recommend running a trespassing detection algorithm. If the detected object is a swimming pool, the system may recommend running a floating face down detection algorithm. These are just examples.
In some cases, a particular video analytics algorithm, or a group of video analytics algorithms, may have a set of desired conditions. As an example, face mask detection algorithms may have a set of desired conditions. Some of these desired conditions include the face size being adequately large to obtain features, such as at least 50×50 pixels. Face pose and skew can be a challenge for accuracy. Adequate lighting on the face may not always be possible, and camera placement can be important. Side poses can be problematic, as can blurring of face images.
As an example, a camera should be 8 to 10 feet above the ground, and its mounting should be adequate to provide frontal face images, with a horizontal FOV (field of view) of not more than 60 degrees. Ceiling mounted or overhead cameras which have a near vertical view of faces is not recommended. Bright light, including sunlight, in the background that results in poor image quality is not recommended. Using a wide-angle camera with a long FOV is not recommended.
With respect to image quality, a minimum of full HD (2 megapixel) video streams are recommended, with a high bit rate (5 to 8 Mbps for 30 fps) with no blur and good focus. For a 3 MP camera, the recommended bit rate of 10 to 15 Mbps is recommended. With respect to illumination, 100 to 150 lux, with uniform illumination on both sides of the face is recommended. With respect to face pose, a front pose plus or minus 45 degrees is recommended. With respect to detection distances, a maximum horizontal distance of 15 feet is recommended. A static background is optimal.
As noted, relative camera placement can be important.
Calibration can be important. In some instances, calibration may occur relative to a ground plane. The ground plane may be flat or angled, for example. The ground plane may represent a floor, a road surface, a dirt surface, a grass field, or the like. In some cases, a field of view may have multiple planes, such as a flat surface defining a ground plane, and a staircase defining a second plane at an angle relative to the ground plane. In such cases, it may be useful to calibrate relative only to the ground plane, not the second (or third, or fourth) planes that extend at an angle relative to the ground plane.
In some cases, the system may provide a recommendation to adjust the camera placement and/or other conditions to adjust one or more of the video parameters of the video stream captured by the camera. In some cases, the recommendations recommend an adjustment to the camera placement, one or more camera settings, lighting and/or other conditions to change one or more of the video parameters of the video stream captured by the camera toward the desired video parameters of a video analytics algorithm to increase the accuracy level of the video analytics algorithm (e.g. a facial recognition video analytics algorithm).
Those skilled in the art will recognize that the present disclosure may be manifested in a variety of forms other than the specific embodiments described and contemplated herein. Accordingly, departure in form and detail may be made without departing from the scope and spirit of the present disclosure as described in the appended claims.