System and method for inhibiting or causing automated actions based on person locations estimated from multiple video sources

Information

  • Patent Grant
  • 10958877
  • Patent Number
    10,958,877
  • Date Filed
    Wednesday, November 11, 2015
    8 years ago
  • Date Issued
    Tuesday, March 23, 2021
    3 years ago
  • Inventors
  • Original Assignees
    • Helmerich & Payne Technologies, LLC (Tulsa, OK, US)
  • Examiners
    • Williams; Jeffery A
    • Dobbs; Kristin
    Agents
    • Kilpatrick Townsend & Stockton LLP
Abstract
The invention relates to systems and method for inhibiting or causing automated actions based on estimated person locations comprising multiple video sources configured to detect the location of one or more persons wherein at least one video source is calibrated for a known location and pose. The invention further comprises at least one processor operably connected to a calibrated video source wherein said processor aggregates possible person locations. These systems and method may be useful for initiating or interrupting the automated activity of equipment in the presence of personnel.
Description
FIELD OF THE INVENTION

The invention relates to systems and methods for inhibiting or causing automated actions based on person locations estimated from multiple video sources.


BACKGROUND AND SUMMARY

Modern drilling involves scores of people and multiple inter-connecting activities. Obtaining real-time information about ongoing operations is of paramount importance for safe, efficient drilling. As a result, modern rigs often have thousands of sensors actively measuring numerous parameters related to vessel operation, in addition to information about the down-hole drilling environment.


Despite the multitude of sensors on today's rigs, a significant portion of rig activities and sensing problems remain difficult to measure with classical instrumentation, and person-in-the-loop sensing is often utilized in place of automated sensing.


By applying automated, computer-based video interpretation, continuous, robust, and accurate assessment of many different phenomena can be achieved through pre-existing video data without requiring a person-in-the-loop. Automated interpretation of video data is known as computer vision, and recent advances in computer vision technologies have led to significantly improved performance across a wide range of video-based sensing tasks. Computer vision can be used to improve safety, reduce costs and improve efficiency.


A long standing goal in many industrial settings is the automation of machinery to improve speed and efficiency. However, automated machinery motion and other automated actions in industrial settings can pose significant risks to personnel in these environments. Accurate information about the locations of persons could help to prohibit certain actions at certain times when people may be in harm's way.


Current technologies for detecting person location (e.g., RFID, manual person location reporting, lock-out keys) all require personnel to take specific actions to ensure their locations are known. These actions (e.g., wearing the RFID, reporting your location, using your key) represent an unnecessary failure point, which results in preventable injuries. In contrast to these approaches, video data from multiple cameras can provide accurate person location information without requiring any actions on the part of the people in the scene, and in an unobtrusive manner. Therefore, there is a need for a video-based technique for tracking person locations in an industrial environment and the incorporation of that information into automated system controls to prohibit certain activities when personnel might be in danger.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts one of many embodiments of a system involving multiple video sources cameras for monitoring personnel location.



FIG. 2 depicts a potential series of steps involved in a method for monitoring person location and controlling machinery accordingly.





DETAILED DESCRIPTION

The Personnel Video Monitoring system, “PVM”, consists of several parts. The PVM may include multiple individual video sources 102, each of which may or may not view the same region, and each of which may have been calibrated so its location and pose relative to the other video source 102 is known. Each video source 102 may be a camera 104. Each camera 104 may contain or be connected to a computer 106 which performs person detection on the scene viewed by that camera 104 and may generate a “bounding box” (or other additional information) for each person 101 in each frame. Each camera 104 may transmit each bounding box to a central computing resource 106, which aggregates possible person 101 locations across all of the cameras 104 (using each camera's bounding boxes, location, and pose information) to estimate actual person 101 locations via triangulation, and false-alarm rejection. This process results in a final list of person 101 coordinates in 2- or 3-Dimensions. The resulting information about person 101 locations may then be presented to the end-user on a 2- or 3-D visualization of the environment, with specific icons representing each person's 101 location or provided to machinery control systems 108 to cause or inhibit certain automated activities. These automated activities often involve the automated movement of mechanical equipment 112 which is capable of injuring a person 101 who is in the vicinity. Examples of such equipment 112 could be conveyors, cranes, hoists, pumps, drilling equipment, carts, sleds, elevators, winches, top drives, industrial equipment and many others.


For calibration, any number of fiducial objects with known X, Y, and Z locations can be used within the scene. Typical fiducial objects include a grid of black and white squares, or 3-dimensional objects with multiple distinctly colored parts. Alternatively, calibration can be accomplished with pre-existing fiducials in the scene (e.g., known locations of machinery). Points on these fiducials may be detected automatically, or manually identified (by clicking) in each camera view. Calibration can then proceed using any of a number of camera parameter optimization techniques (e.g., linear or non-linear least-squares, batch-least-squares, etc.).


Any of a number of person detection algorithms can be utilized on a per-camera basis (e.g., HOG or ICF). Different algorithms provide different performances in terms of probability of detecting a person as well as probability of generating a bounding box when no person is present. Each approach also provides different sensitivities to lighting and occlusion and different algorithms have different computational requirements. As a result, the choice of person detection algorithm can be accomplished on a per-installation basis.


Person detections may consist of a “bounding box” surrounding the person in each frame, as well as additional information (e.g., the index of the current camera with the detection, features extracted from the image around the person, the raw [RGB] data inside and immediately surrounding the bounding box, the time and frame number of the person detection, other meta-data, as well as various other parameters estimated during the person detection process).


Person detection data (bounding boxes, RGB color data, or associated meta-data) may be transferred to the central tracking computer 106 using TCP/IP, UDP, or any other suitable data transfer protocol.


Person detections can be aggregated temporally within a camera 104 to reduce false alarms, and improve the probability of detection by keeping track of person detection confidences over time within a camera view. This can prevent spurious false alarms, and help to improve detection of difficult-to-see persons.


The bounding box from each camera 104 may be combined with the information about each camera's pose and location to provide a global coordinate estimate of the person's location using either bearing-only or joint bearing-and-range-based triangulation.


For horizontally oriented cameras 104, the only information about a person's distance from the camera is due to the height of the detected bounding box. This information may be combined with the bounding box location to estimate both a distance and a scale, though the box height is often too noisy to provide robust range estimation.


For cameras 104 oriented somewhat vertically with respect to the horizon, person range can be estimated using the centroid of the person's bounding-box, and assuming that the mid-point of the person is in some reasonable range (e.g., waistline is above 24″ and below 50″), which enables joint bearings (angle) and range measurements.


The bearings and range information from each camera 104 may then be superimposed on a grid-representation of the area under consideration. This grid may take the form of a 2-D matrix of points, where each point represents a square region with nominal side length (e.g., 2″, 6″, 1′) (the grid can be adaptively re-sized to account for additional cameras or to increase processing speed). Each detection from each camera 104 counts as a “vote” into all of the grid squares within a given distance of the line segment generated by projecting the bounding box centroid into the plan-view. Votes are aggregated across all the cameras 104 and all the detections. The resulting vote plan-view may be smoothed using a Gaussian filter whose size is proportional to the expected size of persons in the scene. This smoothed image represents the spatial confidence map for person locations in the current frame.


The resulting current time spatial confidence map, c(t), is combined with the confidence map from the previous frame, c(t−1), in an IIR manner, to form the final confidence map, C(t). e.g.,

C(t)=αc(t)+(1−α)c(t−1),

where α∈[0,1] is chosen to provide adequate tradeoffs between fast adaptation (large α) and false alarm reduction (small α).


The final confidence map, C(t), is then transformed into discrete person locations using local maxima of the 2- or 3-D surface. Consistent person identification may be accomplished using Kalman filtering on the resulting discrete person locations over time, where the identity of each person is attached to the most likely current local maxima given the prediction of the Kalman filter. Persons can appear and be removed from the scene when the likelihood of their track is reduced below a threshold. Other tracking algorithms can be used in place of the Kalman filter, including particle filters, MCMC tracking approaches, and others.


Various ad-hoc parameters are used to enhance the person detection and tracking. For example, realistic human motion is highly constrained (persons are extremely unlikely to move faster than 40 km/h, or about 11 m/s). These velocity constraints can therefore be leveraged in the tracking by invalidating any human track that involves jumps in excess of ⅓ meters per frame (in a 30 Hz video), for example.


Persistent clutter (objects in the scene that cause person-detectors to “false alarm”) can pose a serious problem for accurate person tracking and localization. In the case of static background objects, performance may be significantly improved by incorporating either change-detection (via adaptive Gaussian Mixture Models), or static clutter rejection techniques (by training a person detection classifier to reject instances of the static background under consideration). These options are available at the user's specification and may be accessible from the main graphical user interface (see below).


Determining when a new person has entered or left the scene can be accomplished in a number of ways. As discussed above, persons can be removed from the scene when the likelihood of their track has decreased below a certain level, or no person detections in any camera corresponding to that person have occurred in a given time. Person birth/death processes can also take into account egress and ingress points on the spatial rig representation. For example, persons may be expected to enter and exit the spatial map near doors, or stairwells, but are unlikely to disappear from the scene when standing in the middle of the room. These regions are used in the person tracking/birth/death processing, and may be set by the user through the graphical user interface.


A graphical user interface 120 (GUI) may enable the end-user to visualize the resulting person locations in the individual video streams, as well as in a 2- or 3-D representation of the space. A preferred embodiment of a GUI may consist of several parts, including a row of video feeds from each camera in the PVM system. These images may be “clickable”. Clicking any video stream changes the main video visualization to correspond to that camera. This action makes the camera selected the “active camera”.


A GUI 120 may also include a main video visualization showing the video from the active camera. Each video may be enhanced by drawing the detected bounding boxes for all person detections, as well as ellipses representing the persons estimated location projected onto the ground. The ellipses are generated by projecting a fixed-radius circle around each person location onto the ground plane using the camera parameters. Each circle on each detected person is assigned a color, and the color-to-person relationship is maintained throughout the video (e.g., the person with the green circle will always be represented by a green circle). The radius, thickness, and transparency of each person identification circle is proportional to the certainty with which that person is currently localized. As a person leaves a scene, or is occluded, the circle corresponding to their location will increase in size, decrease in thickness, and increase in transparency, until it disappears and the person is “lost”.


A plan-view or 3-D map of the area under surveillance may show the locations of the persons in world-coordinates, as well as the locations of the cameras. Persons are represented as simplified icons and the colors in the map may be chosen to match the circles drawn around each person in the main frame. In the plan-view visualization, the active camera is identified by changing its color (e.g., the active camera icon may be red while the other camera icons are black).


The GUI may also include various options for saving, loading, exporting, starting, stopping, and changing parameters of the UI.


In addition, depending on needs, the outputs of the person localization information may be coupled into SCADA control systems, and utilized to either raise an alarm 110 if a person is in close proximity to moving equipment, or to inhibit automated actions when a person is in too close proximity to the equipment.


In FIG. 1, video sources 102 and cameras 104 are mounted with arbitrary pan, tilt and/or zoom settings in arbitrary locations around the scene to be surveyed. In a preferred embodiment, at least three cameras 104 should be able to “view” each spatial location where people 101 should be tracked and personnel at the maximum distance in each camera 104 view should be at least 64 pixels in height. Complete overlap between camera 104 views is not necessary as long as at least three cameras 104 are able to view each spatial location where people should be tracked. Given enough cameras 104, different cameras 104 can view completely different regions. These cameras 104 are connected to processor 106 which is configured to monitor the location of the personnel 101 as discussed. Processor 106 is also connected to machinery control system 108 which is configured to initiate or interrupt any automated actions of equipment 112 based on the location of the personnel 101.


In FIG. 2, the steps of potential method of controlling automated equipment 112 is described. The method includes acquiring visual data, step 206, analyzing visual data, step 208, generating bounding box data, step 210, processing bounding box data, step 212, initiating or interrupting automated actions, step 214, displaying data, step 216 and alerting staff, step 218. Other embodiments may use some or all of these steps to achieve the primary objective of initiating or interrupting automated action based on the location of personnel.


Disclosed embodiments relate to a system for inhibiting or causing automated actions based on estimated person locations. The system may include multiple video sources 102 which are configured to detect the location of one or more persons 101. The video source 102 may be calibrated for a known location and pose. The system also includes at least one processor 106 operably connected to the calibrated video sources 102. The processor 106 aggregates possible person 101 locations. The system may also include a machinery control system 108 which is configured to initiate or interrupt automated activities in response to possible person 101 locations.


Alternative embodiments of the system may also include a visualization of the environment presented to an end user, and/or an alarm 110 for alerting staff to the occurrence of a pre-determined condition. Certain embodiments will utilize a plurality of video sources 102 which are combined to provide enhanced person location confidence. In many embodiments the at least one video source 102 will be a camera 104.


Another disclosed embodiment relates to a method for inhibiting or causing automated actions based on estimated person locations. The method may include the steps of acquiring visual data 206 from at least one video source, analyzing said visual data 208 and inhibiting or causing automated actions 214 based on said data using a machinery control system. Additional embodiments may also include the steps of displaying the acquired or analyzed data 216, alerting staff to the occurrence of a pre-determined condition 218, generating bounding box data 210 for at least one person in a frame and/or processing bounding box data 212 to determine person locations via triangulation.

Claims
  • 1. A system for inhibiting or causing automated actions based on estimated person locations comprising: a plurality of video sources configured to detect a location of one or more persons on a drilling site having a drilling rig and one or more pieces of machinery associated with the drilling rig, each of the one or more pieces of machinery having a respective known location, wherein at least one video source is calibrated using pre-existing fiducials in visual data from the at least one video source, wherein the pre-existing fiducials comprise the one or more pieces of machinery, wherein the calibration is based on the known locations of the one or more pieces of machinery, and wherein at least one video source is positioned to capture a perspective view, andat least one processor operably connected to the calibrated video sources wherein said processor aggregates possible person locations, generates bounding box data for at least one person in a frame, and processes the bounding box data to determine person locations in 3-dimensions; anda machinery control system configured to initiate, alter or interrupt automated activities in response to determined person locations in 3-dimensions, wherein the machinery control system is operably connected to equipment.
  • 2. The system of claim 1, further comprising a visualization of an environment presented to an end user.
  • 3. The system of claim 1, wherein the plurality of video sources are combined to provide enhanced person location confidence.
  • 4. The system of claim 1, further comprising an alarm for alerting staff to an occurrence of a pre-determined condition.
  • 5. The system of claim 1, wherein at least one video source is a camera.
  • 6. A method for inhibiting, altering or causing automated actions based on estimated person locations on a drilling site having a drilling rig, the method comprising: acquiring visual data from at least one video source, wherein at least one video source is positioned to capture a perspective view and calibrated using pre-existing fiducials in the visual data, wherein the pre-existing fiducials comprise one or more pieces of machinery associated with the drilling rig, and wherein the calibration is based on known locations of the one or more pieces of machinery,analyzing the visual data,generating bounding box data for at least one person in a frame;processing the bounding box data to determine person locations in 3-dimensions; andinhibiting or causing automated actions based on the person locations in 3-dimensions using a machinery control system, wherein the machinery control system is operably connected to equipment.
  • 7. The method of claim 6, further comprising displaying the acquired or analyzed data.
  • 8. The method of claim 6, further comprising alerting staff to an occurrence of a pre-determined condition.
  • 9. The method of claim 6, further comprising processing bounding box data to determine person locations via triangulation.
  • 10. A system for monitoring personnel location on a drilling site having a drilling rig and controlling automated equipment in response to personnel location, the system comprising: at least one video stream provided by at least one camera operably connected to a processor, wherein at least one camera is positioned to capture a perspective view and calibrated using pre-existing fiducials in the video stream, wherein the pre-existing fiducials comprise one or more pieces of machinery associated with the drilling rig, wherein the calibration is based on known locations of the one or more pieces of machinery, and wherein the processor is configured to monitor a location of any personnel present in the video stream in 3-dimensions and generate bounding box data for any personnel present in the video stream in 3-dimensions;a machinery control system operably connected to the processor, wherein the machinery control system is configured to receive commands to either initiate or stop automated operations of equipment based on the bounding box data for personnel present in the video stream in 3-dimensions; anda graphical user interface configured to provide the video stream to an operator.
  • 11. The system of claim 1, wherein the equipment is down-hole drilling equipment.
  • 12. The system of claim 1, wherein the equipment is selected from the group consisting of conveyors, cranes, hoists, pumps, carts, sleds, elevators, winches, and top drives.
  • 13. The system of claim 1, wherein the calibration comprises detecting the pre-existing fiducials in the visual data automatically.
  • 14. The system of claim 1, wherein the calibration comprises using a camera parameter optimization technique selected from the group comprising: linear least-squares, non-linear least-squares, and batch-least squares.
  • 15. The method of claim 6, wherein the calibration comprises detecting the pre-existing fiducials in the visual data automatically.
  • 16. The method of claim 6, wherein the calibration comprises using a camera parameter optimization technique selected from the group comprising: linear least-squares, non-linear least-squares, and batch-least squares.
  • 17. The system of claim 10, wherein the calibration comprises detecting the pre-existing fiducials in the video steam automatically.
US Referenced Citations (45)
Number Name Date Kind
6256046 Waters Jul 2001 B1
6469734 Nichani Oct 2002 B1
6646676 Dagraca et al. Nov 2003 B1
7874351 Hampton et al. Jan 2011 B2
7933166 Goodman Apr 2011 B2
8218826 Ciglenec et al. Jul 2012 B2
8233667 Helgason et al. Jul 2012 B2
8363101 Gschwendtner et al. Jan 2013 B2
8395661 Olsson et al. Mar 2013 B1
8547428 Olsson et al. Oct 2013 B1
8622128 Hegeman Jan 2014 B2
8812236 Freeman et al. Aug 2014 B1
8873806 Kiest Oct 2014 B2
9041794 Olsson et al. May 2015 B1
9134255 Olsson et al. Sep 2015 B1
9279319 Savage Mar 2016 B2
9410877 Maxey et al. Aug 2016 B2
9464492 Austefjord et al. Oct 2016 B2
9518817 Baba et al. Dec 2016 B2
9651468 Rowe et al. May 2017 B2
9664011 Kruspe et al. May 2017 B2
9677882 Kiest Jun 2017 B2
9706185 Ellis Jul 2017 B2
9869145 Jones et al. Jan 2018 B2
9912918 Samuel Mar 2018 B2
9915112 Geehan et al. Mar 2018 B2
10227859 Richards et al. Mar 2019 B2
10328503 Osawa et al. Jun 2019 B2
10567735 Ellis et al. Feb 2020 B2
20120267168 Grubb Oct 2012 A1
20140002617 Zhang et al. Jan 2014 A1
20140326505 Davis et al. Nov 2014 A1
20140333754 Graves et al. Nov 2014 A1
20150138337 Tjhang et al. May 2015 A1
20150218936 Maher et al. Aug 2015 A1
20170089153 Teodorescu Mar 2017 A1
20170161885 Parmeshwar et al. Jun 2017 A1
20170167853 Zheng et al. Jun 2017 A1
20170322086 Luharuka et al. Nov 2017 A1
20180180524 François et al. Jun 2018 A1
20190100988 Ellis et al. Apr 2019 A1
20190102612 Takemoto et al. Apr 2019 A1
20190136650 Zheng et al. May 2019 A1
20190141294 Thörn et al. May 2019 A1
20190206068 Stark et al. Jul 2019 A1
Foreign Referenced Citations (8)
Number Date Country
2016147045 Sep 2016 WO
2017042677 Mar 2017 WO
2017132297 Aug 2017 WO
2017176689 Oct 2017 WO
2018093273 May 2018 WO
2018131485 Jul 2018 WO
2018148832 Aug 2018 WO
2018157513 Sep 2018 WO
Non-Patent Literature Citations (3)
Entry
PCT Search Report & Written Opinion (PCT/US2015/060174), dated Jan. 27, 2016.
You Want what kind of a view?—Dakatec Inc. Patent Drawings, https://www.dakatec.com/you-want-what-kind-of-a-view/ [From the Internet] pp. 1-2, Jun. 2019.
Perspective View—What is Perspective View?, http://www.computerhope.com/jargon/p/persect.htm, [From the Internet] p. 1, Jun. 2019.
Related Publications (1)
Number Date Country
20160134843 A1 May 2016 US
Provisional Applications (1)
Number Date Country
62078569 Nov 2014 US