This patent application claims priority to Chinese Patent Application Ser. No. 201810211942.2, filed Mar. 5, 2018, which is incorporated herein by reference in its entirety.
The following description relates to pedestrian tracking and, more specifically, to a method of pedestrian tracking using a network of partially or completely overlapped depth sensors.
Pedestrian tracking plays an important role in intelligent building technologies. These include, but are not limited to, building security and safety technologies, elevator scheduling optimization technologies and building energy control technologies.
The performance of pedestrian tracking methods is usually affected by two related issues: a crowd of pedestrians typically results in the occlusion of targeted individuals and most sensors have a limited field of view (FOV). As such, systems may have difficulty accurately tracking multiple moving pedestrians across a wide area such as, for example, a large elevator lobby area.
According to an aspect of the disclosure, an object tracking system is provided and includes a depth sensor deployed to have at least a nearly continuous field of view (FOV) and a controller coupled to the depth sensor. The controller is configured to spatially and temporally synchronize output from the depth sensor and to track respective movements of each individual object within the nearly continuous FOV as each individual object moves through the nearly continuous FOV.
In accordance with additional or alternative embodiments, the depth sensor is deployed to have a continuous FOV.
In accordance with additional or alternative embodiments, the spatial synchronization is obtained from a comparison between output from the depth sensor and a coordinate system defined for the object tracking region and the depth sensor.
In accordance with additional or alternative embodiments, the temporal synchronization is obtained by one or more of reference to a network time and time stamps of the output of the depth sensor.
According to another aspect of the disclosure, an object tracking system is provided and includes a structure formed to define an object tracking region, a network of depth sensors deployed throughout the structure to have at least a nearly continuous field of view (FOV) which is overlapped with at least a portion of the object tracking region and a controller coupled to the depth sensors. The controller is configured to spatially and temporally synchronize output from each of the depth sensors and to track respective movements of each individual object within the nearly continuous FOV as each individual object moves through the nearly continuous FOV.
In accordance with additional or alternative embodiments, the object tracking region includes an elevator lobby.
In accordance with additional or alternative embodiments, the object tracking region includes a pedestrian walkway in a residential, industrial, military, commercial or municipal property.
In accordance with additional or alternative embodiments, the network of depth sensors is deployed throughout the structure to have a continuous overlapped FOV.
In accordance with additional or alternative embodiments, the spatial synchronization is obtained from a comparison between output from each of the depth sensors and a coordinate system defined for the object tracking region and each of the depth sensors.
In accordance with additional or alternative embodiments, the temporal synchronization is obtained by reference to a network time.
In accordance with additional or alternative embodiments, the temporal synchronization is obtained from time stamps of the output of each of the depth sensors.
According to yet another aspect of the disclosure, an object tracking method is provided and includes deploying depth sensors to have at least a nearly continuous field of view (FOV), spatially and temporally synchronizing the depth sensors to world coordinates and a reference time, collecting depth points from each depth sensor, converting the depth points to depth points of the world coordinates, projecting the depth points of the world coordinates onto a plane; and executing data association with respect to the projection of the depth points of the world coordinates onto sequential maps of the plane during passage of the reference time to remove outlier tracklets formed by projected depth points in a relatively small number of the maps and to group remaining tracklets formed by projected depth points in a relatively large number of the maps.
In accordance with additional or alternative embodiments, the deploying includes deploying the depth sensors in a network within a structure formed to define an object tracking region such that the nearly continuous FOV overlaps with at least a portion of the object tracking region.
In accordance with additional or alternative embodiments, the deploying includes deploying the depth sensors to have a continuous FOV.
In accordance with additional or alternative embodiments, the spatially synchronizing of the depth sensors to the world coordinates includes calibrating each of the depth sensors to the world coordinates and the temporally synchronizing of the depth sensors to the reference time includes one or more of linking to a network time and time stamping output of each of the depth sensors.
In accordance with additional or alternative embodiments, the relatively small and large numbers of the maps are updateable.
In accordance with additional or alternative embodiments, the method further includes executing a nearest neighbor search to group the remaining tracklets.
In accordance with additional or alternative embodiments, the converting of the depth points to the depth points of the world coordinates includes converting each of the depth points to the depth points of the world coordinates.
In accordance with additional or alternative embodiments, the method further includes executing a shape model to aggregate multiple points with a spatial distribution for subsequent projection or to aggregate multiple projected points into a point for subsequent tracking.
These and other advantages and features will become more apparent from the following description taken in conjunction with the drawings.
The subject matter, which is regarded as the disclosure, is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features and advantages of the disclosure are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
These and other advantages and features will become more apparent from the following description taken in conjunction with the drawings.
As will be described below, a pedestrian tracking system is provided to accurately track multiple moving pedestrians across a wide area. The pedestrian tracking system includes multiple sensors (e.g., 2D, 3D or depth sensors) with a near-continuous field of view (FOV) or, in one embodiment, multiple, spatially overlapped sensors with a continuous FOV. In either case, each of the multiple sensors have the capability to distinguish between multiple moving objects even when a number of those moving objects are occluded.
With reference to
There are both qualitative and quantitative differences between conventional 2D visible-spectrum imaging and depth sensing. In 2D imaging (equivalently 2D video, since 2D video includes successive 2D images), the reflected color (mixture of wavelengths) from the first object in each radial direction from the camera is captured. The image, then, is a 2D projection of the 3D world where each pixel is the combined spectrum of the source illumination and the spectral reflectivity of an object in the scene (and, possibly, the object's own emissivity). In depth sensing, there is no color (spectral) information. Rather, each ‘pixel’ is the distance (depth, range) to the first object in each radial direction from the depth sensor. The data from depth sensing is typically called a depth map or point cloud.
Sometimes, a depth map or point cloud is confusingly called a depth image or 3D image, but it is not an image in any conventional sense of the word. Generally, a 2D image cannot be converted into a depth map and a depth map cannot be converted into a 2D image (an artificial assignment of contiguous colors or grayscale to contiguous depths allows a person to crudely interpret a depth map somewhat akin to how a person sees a 2D image as in
As shown in
With reference to
As used herein, a nearly continuous combined FOV 15 may be characterized in that the respective FOVs 151-n of each of the 3D depth sensors 141-n overlap with significant portions of neighboring FOVs 151-n or, to the extent that such overlapping is not provided or possible, as in the case of a corner or a hidden area within the object tracking region 12, spaces between neighboring FOVs 151-n are configured to be relatively small as compared to the overall sizes of the FOVs 151-n.
While the description provided herein refers to 3D depth sensors, it is to be understood that embodiments exist in which the sensors are a mix of 2D and/or 3D depth sensors as well. In the case of 2D depth sensors, in particular, such sensors would provide depth information relating to distances between objects and the 2D depth sensors but may not provide additional detail relating to a shape and size of the objects. Reference to 3D depth sensors herein is, therefore, done for clarity and brevity and should not be interpreted in such a way as to otherwise limit the scope of the claims or the application as a whole.
In accordance with embodiments, each of the 3D depth sensors 141-n may include or be provided as a depth sensor or, more particularly, as a Kinect™ or Astra™ sensor.
With reference to
It should be noted at this point, that tracking could have difficulty with segmenting each of the individual objects. As such, a tracking algorithm may include or have track fork and join capabilities. As used herein, fork and join capabilities refer to the separation of one track into more than one track and the merging of one or more tracks into one track.
In accordance with embodiments, the spatial synchronization may be obtained by the processing unit 301 from a comparison between output from each of the 3D depth sensors 141-n with a coordinate system that is defined for the object tracking region 12 and for each of the 3D depth sensors 141-n. The temporal synchronization may be obtained by the processing unit 301 by one or more of reference to a network time and from time stamps of the output of each of the 3D depth sensors 141-n.
In accordance with embodiments, the coordinate system may be provided as a Cartesian coordinate system. However, it is to be understood that this is not required and that any other coordinate system can be used as long as it can be established consistently throughout the object tracking region 12.
With reference to
In accordance with embodiments, the temporal synchronization and/or the reference time may also take into account an interval of time between collection of three-coordinate depth points from each of the 3D depth sensors 141-n.
Although the network of 3D depth sensors 141-n is described above as being deployed throughout the structure 11 to have at least the nearly continuous combined field of view (FOV) 15 which is made up of the respective FOVs 151-n of each of the 3D depth sensors 141-n, it is to be understood that the network of 3D depth sensors 141-n may be deployed throughout the structure 11 to have a continuous combined field of view (FOV) 15 which is made up of the respective FOVs 151-n of each of the 3D depth sensors 141-n. For purposes of clarity and brevity, the following description will relate to the case in which the network of 3D depth sensors 141-n are deployed throughout the structure 11 to have the continuous combined field of view (FOV) 15.
With reference to
As shown in
In any case, the object tracking method further includes spatially and temporally synchronizing the 3D depth sensors to world coordinates (or a coordinate system) and a reference time, respectively (blocks 502 and 503). As explained above, the spatial synchronization of block 502 may be obtained from a comparison between 3D depth sensor output and a coordinate system defined for the object tracking region and each of the 3D depth sensors. The temporal synchronization of block 503, as explained above, may be obtained by one of reference to a network time and from time stamps of the output of the 3D depth sensors.
Thus, in accordance with embodiments, the spatially synchronizing of the 3D depth sensors to the world coordinates of block 502 may include calibrating each of the 3D depth sensors to the world coordinates (block 5021). Similarly, the temporally synchronizing of the 3D depth sensors to the reference time of block 503 may include one of linking to a network time (block 5031) and time stamping output of each of the 3D depth sensors (block 5032).
The method may then include collecting three-coordinate depth points from each 3D depth sensor (block 504), converting at least two of the three-coordinate depth points to depth points of the world coordinates (block 505) and projecting the depth points of the world coordinates onto a 2D plane (block 506). The collection of three-coordinate depth points of block 504 can be conducted with respect to the output of the 3D depth sensors and a number of the collected three-coordinate depth points can be established ahead of time or during the collection process itself in accordance with an analysis of the spread of the three-coordinate depth points (i.e., a small spread might require fewer points whereas a larger spread might require a larger number of points).
The conversion and projection of blocks 505 and 506 can be executed in any order.
The method may also include executing data association (block 507). The executing of the data association of block 507 is conducted with respect to the projection of the depth points of the world coordinates onto sequential maps or frames of the 2D plane during passage of the reference time. The execution of data association thus serves to remove or facilitate removal of outlier tracklets formed by projected depth points in a relatively small and updateable number of the maps or frames and to group remaining tracklets formed by projected depth points in a relatively large and updateable number of the maps or frames. In accordance with embodiments, the relatively small and large numbers of the maps or frames are updateable in accordance with a desired accuracy of the object tracking method, available computation time and resources and historical records.
The object tracking method may further include executing a nearest neighbor search to group the remaining tracklets (block 508). This can be done by, for example, an automatic process of image recognition on a computing device.
With reference to
As shown in
In any case, the object tracking method further includes spatially and temporally synchronizing the 3D depth sensors to world coordinates (or a coordinate system) and a reference time, respectively (blocks 602 and 603). As explained above, the spatial synchronization of block 602 may be obtained from a comparison between 3D depth sensor output and a coordinate system defined for the object tracking region and each of the 3D depth sensors. The temporal synchronization of block 603, as explained above, may be obtained by one of reference to a network time and from time stamps of the output of the 3D depth sensors.
Thus, in accordance with embodiments, the spatially synchronizing of the 3D depth sensors to the world coordinates of block 602 may include calibrating each of the 3D depth sensors to the world coordinates (block 6021). Similarly, the temporally synchronizing of the 3D depth sensors to the reference time of block 603 may include one of linking to a network time (block 6031) and time stamping output of each of the 3D depth sensors (block 6032).
The method may then include collecting three-coordinate depth points from each 3D depth sensor (block 604), converting each of the three-coordinate depth points to depth points of the world coordinates (block 605) and projecting the depth points of the world coordinates onto a 2D plane (block 606). The collection of three-coordinate depth points of block 604 can be conducted with respect to the output of the 3D depth sensors and a number of the collected three-coordinate depth points can be established ahead of time or during the collection process itself in accordance with an analysis of the spread of the three-coordinate depth points (i.e., a small spread might require fewer points whereas a larger spread might require a larger number of points).
The conversion and projection of blocks 605 and 606 can be executed in any order.
The method may also include executing data association (block 607). The executing of the data association of block 607 is conducted with respect to the projection of the depth points of the world coordinates onto sequential maps or frames of the 2D plane during passage of the reference time. The execution of data association thus serves to remove or facilitate removal of outlier tracklets formed by projected depth points in a relatively small and updateable number of the maps or frames and to group remaining tracklets formed by projected depth points in a relatively large and updateable number of the maps or frames. In accordance with embodiments, the relatively small and large numbers of the maps or frames are updateable in accordance with a desired accuracy of the object tracking method, available computation time and resources and historical records.
The object tracking method may further include executing a shape model to aggregate multiple points with a specific spatial distribution (the model) for subsequent projection to the world coordinate plane and tracking or to aggregate multiple projected points into one point for subsequent tracking (block 608). This can be done by, for example, an automatic process of image recognition on a computing device.
In accordance with embodiments, an aggregation of points into one point representing one object by use of a shape model, as in block 608, may be achieved by clustering points and fitting the points of each cluster to the shape model by minimizing the sum of absolute distances of points to the shape model. The clustering may be by K-means, Expectation-maximization (EM), Fuzzy C-means, Hierarchical clustering, Mixture of Gaussians and the like. The associated distance metric may be the Minkowski metric with p=1, 2, or ∞ and the like. The shape model may be a low-order human kinematic model (skeleton), an x-y centroid model (vertical line) and the like. Some models may include additional parameters in the optimization, e.g., for pose and scale.
With reference to
As shown in
As shown in
Of the first tracklets 7011, the outlier first tracklets 7011, which are defined as those first tracklets 7011 that are generated by points that occur in only a small number of maps or frames, are removed as shown in
Finally, as shown in
In the case of a nearly continuous FOV, there may be gaps between tracklets of one object corresponding to when it was not within any depth sensor FOV. The tracklet association across gaps may be accomplished by network flow optimization using metric learning and coherent dynamics based on position and additional parameters such as velocity and acceleration.
For
Benefits of the features described herein are accurate, wide-area tracking of pedestrians using multiple, simultaneous object tracking across multiple depth sensors employing spatial and temporal consistency and use of multi-perspective shape models for improved tracking accuracy.
While the disclosure is provided in detail in connection with only a limited number of embodiments, it should be readily understood that the disclosure is not limited to such disclosed embodiments. Rather, the disclosure can be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the disclosure. Additionally, while various embodiments of the disclosure have been described, it is to be understood that the exemplary embodiment(s) may include only some of the described exemplary aspects. Accordingly, the disclosure is not to be seen as limited by the foregoing description, but is only limited by the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
201810211942.2 | Mar 2018 | CN | national |