Today cameras are everywhere: surveillance systems, camera drones, factory automation cameras, smart phones, and so on. Cameras have been part of the lives of many end users to acquire visual information about the world. It is a very challenging task to locate single or multiple persons or objects of interest in videos from frame to frame across cameras in computer vision (e.g., due to potentially limited computing resources). Visual object tracking plays an important role, for example, in surveillance systems, traffic flow monitoring, autonomous driving, mobile robotics, and industry automation.
Therefore, there is a need for a real-time, near real-time, or substantially real-time visual object tracker for processing videos or image sequences.
According to various example embodiments, an apparatus comprises means for using a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The means are also configured to perform initiating one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The means are further configured to perform using a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
According to various example embodiments, a method comprises using a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The method also comprises initiating one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The method further comprises using a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
According to various example embodiments, a non-transitory computer-readable storage medium having stored thereon one or more program instructions which, when executed by one or more processors, cause, at least in part, an apparatus to use a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The apparatus is also caused to initiate one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The apparatus is further caused to use a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
According to various example embodiments, an apparatus comprises at least one processor, and at least one memory including computer program code for one or more computer programs, the at least one memory and the computer program code configured to, with the at least one processor, cause, at least in part, the apparatus to use a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The apparatus is also caused to initiate one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The apparatus is further caused to use a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
According to various example embodiments, a system comprises one or more devices including one or more of a cloud server device, an edge device, an internet of things (IoT) device, a user equipment device, or a combination thereof. The one or more devices are configured to use a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The one or more devices are also configured to initiate one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The one or more devices are further configured to use a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
According to various example embodiments, a device comprising at least one processor; and at least one memory including a computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the device to use a first object tracking mechanism to detect and associate one or more objects from frame to frame of a video. The device is also caused to initiate one or more second object tracking mechanisms to track the one or more objects detected by the first object tracking mechanism from frame to frame of the video in parallel with the first object tracking mechanism. The device is further caused to use a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed a detection of the object in the frame of the video.
In addition, for various example embodiments of the invention, the following is applicable: a method comprising facilitating a processing of and/or processing (1) data and/or (2) information and/or (3) at least one signal, the (1) data and/or (2) information and/or (3) at least one signal based, at least in part, on (or derived at least in part from) any one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention.
For various example embodiments of the invention, the following is also applicable: a method comprising facilitating access to at least one interface configured to allow access to at least one service, the at least one service configured to perform any one or any combination of network or service provider methods (or processes) disclosed in this application.
For various example embodiments of the invention, the following is also applicable: a method comprising facilitating creating and/or facilitating modifying (1) at least one device user interface element and/or (2) at least one device user interface functionality, the (1) at least one device user interface element and/or (2) at least one device user interface functionality based, at least in part, on data and/or information resulting from one or any 103 combination of methods or processes disclosed in this application as relevant to any embodiment of the invention, and/or at least one signal resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention.
For various example embodiments of the invention, the following is also applicable: a method comprising creating and/or modifying (1) at least one device user interface element and/or (2) at least one device user interface functionality, the (1) at least one device user interface element and/or (2) at least one device user interface functionality based at least in part on data and/or information resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention, and/or at least one signal resulting from one or any combination of methods (or processes) disclosed in this application as relevant to any embodiment of the invention.
In various example embodiments, the methods (or processes) can be accomplished on the service provider side or on the mobile device side or in any shared way between service provider and mobile device with actions being performed on both sides.
For various example embodiments, the following is applicable: An apparatus comprising means for performing a method of the claims.
Still other aspects, features, and advantages of the invention are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the invention. The invention is also capable of other and different embodiments, and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
The various example embodiments of the invention are illustrated by way of examples, and not by way of limitation, in the figures of the accompanying drawings:
Examples of a method, apparatus, and computer program for providing a real-time, near real-time, or substantially real-time visual object tracker are disclosed. In the following description, for the purposes of explanation, numerous specific details and examples are set forth to provide a thorough understanding of the embodiments of the invention. It is apparent, however, to one skilled in the art that the embodiments of the invention may be practiced without these specific details or with an equivalent arrangement. In other instances, structures and devices are shown in block diagram form to avoid unnecessarily obscuring the embodiments of the invention.
As used herein, the term “real-time” refers to an object tracking result being produced within a designated time period (e.g., within milliseconds or other time period defined as real-time) of receiving an input (e.g., a video or image sequence depicting the object to be tracked). The term “near real-time” refers to providing an object tracking result within a time period greater than the time period designated as real-time but less than a second time duration (or less than a designated percentage above the designated time period). The term “substantially real-time” refers to providing object tracking results that meets the criteria for being classified as real-time by greater than a designated percentage of the time or number of instances of object tracking results.
With the great success of the deep learning based neural networks in object and feature detection/extraction, step one of the tracking-by-detection approach can use deep neural networks to detect and segment the objects depicted in the images of a video or image sequence. By way of example, it is contemplated that any type of neural network (or equivalent algorithm) available/capable for object detection (e.g., providing a bounding box around a detected object) and instance segmentation (e.g., providing contours around a detected object) can be used for tracking-by-detection. In one example, one or more frames of the video are processed by the object tracking system (e.g., the tracking-by-detection system) to identify object as a detection (e.g., a bounding box around the object as depicted in the image frame) and then associate the detection across different frames of the video. As used herein, the terms “tracklet” or “symbol” are used synonymously to refer to the detections of the same object across multiple frames of the video or image sequence or a related software object/data file that records the detections.
However, there are two potential technical issues with deep neural network (DNN)-based tracking-by-detection schemes. One issue occurs in the detection stage and is that deep neural networks may not be able to detect an object, such as one or more target objects, in some frames of the video with sufficient detection confidence (e.g., confidence above a specified threshold confidence). This can lead to the loss of tracking of the one or more target objects and/or fragmentation of tracklets. In other words, the DNN may not be able to detect a target object in one or more frames of the video with a target level of detection confidence such that no detection of the target object is reported.
Another issue is that the deep neural networks usually involve intensive computations and need powerful computing devices (e.g., devices with graphics processing units (GPU) or other hardware dedicated to machine learning tasks such as but not limited to neural processing cores, tensor cores, etc.). Because of these intensive computing requirements and/or lack of sufficient hardware to meet those requirements, tracking-by-detection processes can run slowly such that the processing pipelines may not achieve a desired/target frame rate (FPS) to achieve real-time object tracking (e.g., greater than 10 FPS, a frame rate of the video source, or any other specified frame rate associated with real-time tracking). The need to have this type of visual object tracking to perform in near real-time is important for many applications such as, but not limited to, surveillance, security, automation, traffic monitoring, product scanning, shopping scanning, indoor warehouse scanning, and/or similar applications for quick response.
The various example embodiments described herein provide an approach to real-time, near real-time or substantially real-time object tracking in videos or image sequences that addresses the technical problems described above of (1) missed-detection and (2) real-time issues for object tracking based on the tracking-by-detection paradigm. It is contemplated that the various examples described herein can be used for any object detection including but not limited to use-cases for robots/drones/vehicles as well as shopping, warehouse and/or manufacturing applications, e.g., object detection, counting, scanning, tracking, or any combination thereof. The applications of this invention can be extended to both indoor and outdoor use cases.
The various example embodiments described are based on considering that DNN-based object detection is powerful but can be susceptible to at least the two problems described above with respect to DNN-based tracking algorithms (e.g., object-detection imperfections and relatively low achievable frame rate). In other words, to apply deep neural networks to a real-time/near real-time/substantially real-time visual object tracking, the various examples described herein apply a flexible approach to dealing with the technical issues described above.
In one example, the system 100 of
It is noted that DNN-based object detection & feature extraction 107 as the outer tracker 103 and ROI tracking as the inner trackers 105 are provided by way of illustration and not as limitations. It is contemplated that outer tracker 103 and inner trackers 105 can use any type of object tracking algorithm known in the art for either of the outer tracker 103 or inner trackers 105. For example, the outer tracker 103 can be a DNN with more layers with extensive training while the inner tracker 105 can be a DNN with fewer layers or different and/or less training than the outer tracker 103. More generally, in one example, the outer tracker 103 can be a finer object tracker that operates more slowly and/or with a greater field of view of the image while the inner tracker 105 can be a coarser object tacker, e.g., with narrower field of view of the image and/or lower image/frame resolution than the outer tracker, that operates more quickly, e.g. with higher frame rate than the outer tracker 103.
In one example, the system 100 also introduces of an image traffic throttling mechanism as part of an image grabber 113 to achieve desired/target frame rate from one or more video sources (e.g., a camera 115 and/or any other device capable of generating videos, images or image streams including any sources of synthetic, i.e. computer created, or real videos).
In one example, the interactions between the outer tracker 103 and the inner trackers 105 can boost object tracking performance. For example, the outer tracker 103 is responsible for associating detected objects of interest from frame to frame of a video, or across frames of a video, based on the objects' location information and discriminative features information generated by the deep networks (e.g., DNN-based object detection & feature extraction 107). The inner tracker(s) 105 are then used to search for the target object in case of missed detections by the outer tracker 103 either due to false negative errors (e.g., the object detector of the outer tracker 103 fails to detect target objects in the scene) or compute resource limitation (e.g., the DNN-based object detection & feature extraction 107 is running at lower frame rate than the desired/target frame rate or the frame rate being provided by the image grabber 113). In one example, the outer tracker 103 is based on the outputs of one or more deep neural networks 107 that are running, e.g., on GPUs or other neural processors at a lower FPS; and the inner tracking 105 is enabled based on fast and efficient ROI trackers that are running, e.g., on CPUs.
The time sequence diagram of
At process 201, the image grabber 113 receives video from one or more sources (e.g., camera 115 such as a surveillance camera, drone, Internet of Things (IoT) device, or a synthetic source, such as a game server, virtual reality (VR) generator, augmented reality (AR) generator, etc.). The video from these sources may be at any frame rate which may not be consistent with the frame rate at which the object tracking system 101, e.g. the DNN-based object detection & feature extraction 107, is configured to operate. Accordingly, the image grabber 113 adjusts, if needed, the frame rate(s) of the incoming video to a target/desired frame. If the video sources have a higher frame rate than the target/desired frame rate of the object tracking system 101 and/or the DNN-based object detection & feature extraction 107, the image grabber 113 can throttle or otherwise down sample the frame rate to the target/desired frame rate. Conversely, if the video sources have a lower frame rate than the target/desired frame rate, the image grabber 113 can up sample the video to the target/desired frame rate. The image grabber 113 can then pass the frames of the video to the DNN-based object detection & feature extraction 107 at the target/desired frame rate in a signal 203.
At process 205, the DNN-based object detection & feature extraction 107 processes the input frame 203 to detect one or more target objects in the frame. The results of the detection is an annotated frame comprising the original frame along with a bounding box indicating the target object, any extracted features of the target object, associated confidences of the detections or any combination thereof. If the detection is successful (e.g., the detection confidence is above a threshold value), then the annotated frame can be passed in a signal 207 (e.g., frame and detection results) to the outer tracker 103. However, if the detection is unsuccessful (e.g., the detection confidence is below a threshold value, or the detection cannot be completed before an expiration of a designated timer or before arrival of another frame), then a plain frame (e.g., the original frame with no bounding box or extracted features of the target object) can be passed in the signal 207 to the outer tracker 103.
At process 209, the outer tracker 103 performs data association on the received frame and detection results 207 to correlate the detection of the same target object across multiple frames. The data associated frame and detection results, i.e. tracklets, are then passed to the tracking information aggregation 111 in a signal 211. In addition, at process 213, the outer tracker 103 uses the frame and detection results 207 to initiate one or more inner trackers 105 for one or more or each detected target object in the frame and detection results 207 that has been successfully associated to an existing tracklet or symbol. In one example, the outer tracker 103 initiates the inner tracker(s) 105 by passing the region of interest (ROI) information corresponding to the one or more, or each identified target objects in a signal 214 to respective inner trackers 105(1-k). The ROI, for instance, can be based on the bounding box detected for each target object.
At process 215, the inner tracker(s) 105 associated with respective identified target objects uses the received ROI data 214 of the target objects to detect and track the target objects using, for instance, ROI trackers. The object detections of the inner tracker(s) 105 are passed to the inner management function 109 in a signal 217.
At process 219, the inner tracker management function 109 tests object detections 217 (e.g., bounding boxes) have correctly tracked the identified target object, i.e. inner detections. By way of example, the test can be based on a trained re-identification model or algorithm that determines whether the features (e.g., color, shape, etc.) of the target object in the inner detections 217 match the features of previously tracked instances of the same object. In addition, the test can include, but is not limited to, using a motion model to predict the location of the target object in different frames to determine whether the location of the target object in the inner detections 217 match within specified criteria. If the inner detections 217 fail to meet tests such as, but not limit to, the tests described above, then the inner detections 217 is determined to be invalid. Otherwise, the inner detections 217 is classified as valid.
At process 221, the inner tracker management function 109 merges together the inner detections 217 that remain after the detection validity test(s) of process 219. For example, the inner detections 217 can include separate detection results for each target object that has been separately tracked by the respective inner trackers 105 (1-k). The inner tracker management function 109 then merges these separate detections from different ROIs to the same frame so that the merged detections represent all targets detected by any of the multiple instances of the inner tracker 105. The inner tracker management function then passes the merged inner detections resulting from the process 221 back to the outer tracker 103 (e.g., for data association according to the process 209) in a signal 223 and/or to the tracking information aggregation 111 in a signal 225.
At process 225, the tracking information aggregation 111 merges the outer tracking data (e.g., received in signal 211) with the inner tracking data 225 for a processed frame. In one example, merging of the outer and inner tracking results comprising using the outer tracking result for a target object in the frame if the target object has been detected by the outer tracker 103 with a confidence above a threshold confidence. If the confidence is not above the threshold confidence or if the outer tracker 103 has not provided a valid result (e.g., because the detection process timed out or the outer tracker 103 missed the frame), then the inner tracking result instead of the outer tracking result is used in the object tracking output 227.
In one example, the tracking information aggregation 111 uses the object tracking output 227 to update the tracklet or symbol for the target object. In addition or alternatively, the tracking information aggregation 111 can pass the outer-inner merged detections of the object tracking output 227 to back to the image grabber 113 in a signal 229.
At process 231, the image grabber 231 uses the out-inner merged detections 229 to decorate the corresponding frame with representations of the bounding box(es) associated with the tracked target objects. As used herein, a decorated frame refers to a frame that includes a visual rendering of the bounding boxes of the detected target objects. For example, by decoration, the original frame or image is overlaid with the corresponding detection and tracking information by plotting the bounding boxes and identifiers for the objects being tracked in the frame. In this way, the detection and tracking results can be easily visualized. The decorated frame is then sent to the user interface 233 (e.g., a web-based user interface dashboard) to display the decorated frame. In this way, a user monitoring the real-time object tracking process is presented with a visual representation of the object detections.
In summary, the various examples described here provide for at the following features:
In one example, it is contemplated that the system, apparatus and process 100 and/or the real-time object tracking 101 can be implemented in any type of device including but not limited to a cloud-based server, an edge device (e.g., computer, mobile device, mobile communication device, vehicle, etc.), and IoT device (e.g., embedded real-time object tracking 101 in a camera equipped IoT or sensor device). As used herein an IoT device refers to physical devices with connectivity to the Internet or any other data network/communication system. IoT devices, for instance, have built-in/embedded processing capabilities along with supporting sensors, software, firmware, circuitry, and/or the like to implement functions such as but not limited to real-time object tracking 101. Each of these devices can have different combinations of CPUs, GPUs, neural processing cores, tensor cores, cameras (or any other video sources), and/or the like to perform one or more functions of the real-time object tracking 101.
For example, in the case of multiple cameras with limited GPU resources, the example tracking pipeline architecture (e.g., the real-time object tracking 101) of the various example embodiments described herein can be used to achieve higher frame rates.
In the example architecture and process 300A of
In the example architecture and process 300B of
In addition, in one example, the architecture of the real-time object tracking 101 can include an automatic switch from cloud-based architecture to an embedded on-board architecture based on the objects of interest or region of interest to be detected and tracked and the network availability as explained in further detail with respect to
In yet another, the system 100 of
As shown in
It is contemplated that the functions of the components of the real-time object tracking system 101 described above may be combined or performed by other components or means of equivalent functionality. The above presented components or means can be implemented in a circuitry, a hardware, a firmware, a software, a chip set or a combination thereof.
As used in this application, the term “circuitry” may refer to one or more or all of the following:
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
In another example, one or more of the modules 301-309 may be implemented as a cloud-based service, local service, native application, or combination thereof. The functions of the real-time object tracking system 101 and its components are discussed with respect to figures below.
In step 501, the real-time object tracking system 101 uses a first object tracking mechanism (e.g., a DNN-based object tracking such as but not limited to the DNN-based object detection & feature extraction 107 or equivalent) to detect and associate one or more objects from frame to frame/across frames or across of multiple frames of a video. In some examples, the first object tracking mechanism may fail/miss to detect and associate the one or more objects from every sequent frame at some time points, but instead, detects and associates the one or more objects in any subsequent frame.
In step 503, the real-time object tracking system 101 initiates one or more second object tracking mechanisms (e.g., a ROI tracking or equivalent) to track the respective one or more or each objects detected by the first object tracking mechanism from frame to frame or across frames of the video in parallel/concurrently with the first object tracking mechanism. In one example, an ROI to be tracked by the ROI object tracking is provided by the deep-neural network-based object tracker 107 on the initiating of the ROI object tracking. In one example, the real-time object tracking system 101 resizes or crops the frame for input to the one or more second object tracking mechanisms (e.g., based on the ROI specified by the first object tracking mechanism). In one example, an individual/one/respective second object tracking mechanism of the one or more second object tracking mechanisms is respectively initiated for an individual/one/respective object of the one or more objects from the first object tracking mechanism. In other words, one/respective instance of ROI tracking (i.e., one instance of the second object tracking mechanism) is initiated for a given object detected by the first object tracking mechanism (e.g., the DNN-based object tracking)
In one example, to test the tracking output of the second object tracking mechanism, the real-time object tracking system 101 can crop the frame of the video based on a bounding box of the tracking output of the one or more second object tracking mechanisms. Then, the real-time object tracking system 101 can perform a re-identification of the one or more objects based on the cropped, i.e. smaller, frame.
In step 505, the real-time object tracking system 101 determines whether the first object tracking mechanism misses the detection of target objects in the frame being processed. By way of example, the detection can be missed based on determining that (1) the detection confidence for a detection by the first object tracking mechanism is below a threshold confidence; or (2) the first object tracking mechanism could not process the frame before a set time period has expired or before receiving a next frame to process.
In step 507, if the first object tracking mechanism has not missed/failed detections in the frame, the tracking out of the first object tracking mechanism can be used. In one example, the real-time object tracking system 101 can reinitiate the second object tracking mechanism based on a detection of the object by the first object tracking mechanism in a subsequent frame of the video.
However, in step 509, the real-time object tracking system 101 uses a tracking output of the one or more second object tracking mechanisms in place of the first object tracking mechanism for a frame of the video based on determining that first object tracking mechanism has missed/failed a detection of the object in the frame of the video.
In one example, based on the tracking output of the frames of the video, the real-time object tracking system 101 generates a tracklet respectively for the one or more objects based on first object mechanism, the one or more second object tracking mechanisms, or a combination thereof. The tracklet, for instance, is a sequence of detections across a plurality of frames of the video.
In one example, the real-time object tracking system 101 classifies the tracklet as active, inactive, tracked, and/or fragmented based on the first object tracking mechanism, the one or more second object tracking mechanisms, or a combination thereof to facilitate tracking the target objects across frames of the video.
In one example, because the real-time object tracking system 101 relies on the object detection and feature extraction by one or more deep neural networks, the functions of the DNN-based object detection & feature extraction 107 is first described in more detail. For example, in which the object detection is based on deep neural networks, such a deep neural network can be used to perform tasks such as but not limited to one or more of:
In one example, the object detector 603 has output including but not limited to classes, bounding boxes, and confidence levels. If instance segmentation models like Mask-RCNN (Region Based Convolutional Neural Networks) are used, binary masks are also provided. Mask-RCNN, YOLO (You Only Look Once) network, SSD (single-shot detector) network, and any other models refined on custom datasets can be used for object detection. Basic features or considerations of the object detector 603 can include but is not limited to:
In one example, the optional keypoint detector 605 can be used to detect objects (e.g., persons) and localize one or more keypoints (i.e. detailed features) in an object (e.g., eyes, nose, ears, wrists, shoulders, hips, knees, ankles, etc.), e.g. for a pose estimation. The outputs include one or more bounding boxes, keypoint locations, or associated confidence scores, or any combination thereof. Basic features or considerations of the keypoint detector 605 can include but are not limited to:
In one example, the re-identification (Re-ID) model 607 can be a feature representation learning model that has been trained, e.g., using a triplet loss function or equivalent. Given an image portion of a detected object, the Re-ID model 607 can generate a high-dimensional real-valued vector (e.g., embedding) as a representation of the object's appearance features. Basic features or considerations of the Re-ID model 607 can include but are not limited to:
In one example, the optional image classifier 609 is a deep learning model for recognizing an image. The image classifier 609 takes as input an image and returns probabilities that the image belongs to one or more classes of objects. In one example, a binary image classifier may be trained as part of a custom-built Re-ID model 607. Basic features or considerations of the image classifier 109 can include but are not limited to:
In one example, as previously described, the real-time object tracking system 101 is associated with an image grabber 113. By way of example, the image grabber 113 is the frontend of the tracking pipeline. The image grabber 113, for instance, has the following basic functions:
In one example, for real-time applications, the desired/target frame rates can be for example, 10 frames per second or higher (or any other target/desired frame rate); for near-real-time applications, the frame rates can be in the range, for example, from 5 to 10 frames per second (or any other target/desired frame rate range). A desired/target frame rate can be selected based on system performance requirements and the available computation resources. In some examples, the pre-configured frame rates for the tracking systems are lower than the video source frame rates (e.g., 30 frames per second) because of limited computation resources and pipeline throughput in the tracking system. Depending on the communication network bandwidth and delay, the achievable frame rates at which the image grabber 113 is emitting the received images may be lower than the desired FPS, particularly when a synchronous communication interface is employed between the video source and the image grabber 113. When the pre-configured FPS is not achieved due to network issues, the image grabber 113 can operate at the maximum achievable FPS on a best-effort basis.
In order for the image grabber 113 to adapt to the actual detection and tracking pipeline throughput and network condition changes, the image grabber 113 can be configured to receive the detection and tracking information published by the tracking functionality. In one example, the image grabber 113 can also be configured to publish the original video or frames decorated with the detection and tracking information (e.g., superimpose bounding boxes and unique IDs over the original images) for monitoring purposes. Therefore, an interface can be provided for the image grabber 113 to subscribe to the messages produced by the real-time object tracking pipeline that contain the detection and tracking information.
A diagram 700 of the image grabber 113 is illustrated in FIG. 7Error! Reference source not found. where a Request/Reply synchronous interface 701 is also illustrated in dotted lines. The functional blocks include video adapter 703, image buffer 705, and web streamer 707 describe further details on the image grabber 113 as described in the
In various example embodiments, the video adapter 703 is to capture images and publish the latest frame upon receiving a feedback (e.g., a decorated image or a piece of tracking information) from the pipeline. In the case of asynchronous interface to the video source (e.g., Real Time Streaming Protocol (RTSP), Hypertext Transport Protocol (HTTP), or equivalent), the video adapter 703 consists of two threads: one to read the next frame from the interface to the video source and put the newly arriving frame to the one-slot buffer if the buffer is unlocked (so the previous frame is silently discarded) and the other to listen on the TCP port reserved for tracking information message stream and publish the latest frame together with metadata on a TCP port reserved for original image stream upon receiving a message or if no tracking information message is received and the elapsed time exceeds a time interval T1.
In one example, in the video adapter 703, the time interval T1=alpha/TARGET_FPS where TARGET_FPS is a pre-configured frame rate, and the factor “alpha” is greater than 1 and determined based on the current short-term actually achieved frame rate. For example, alpha=1.2 if the achieved frame rate >0.5*TARGET_FPS, otherwise alpha=2.0. The short-term actually achieved frame rate is defined as the number of tracking information messages received over a sliding time window divided by the time window's length.
In one example, the functional block “image buffer” 705 is to overlay the original images with the corresponding detection and tracking information and then publish the decorated images for visualization (e.g., via a decorated image publisher 709). In addition or alternatively, the image buffer 705 is also used to throttle or regulate the flow of images forwarded to the tracking pipeline. By way of example, the following parameters can be used for the functions of the image buffer:
In various example embodiments, as shown in
At step 1001, a counter for the number of decorated images that have been published so far (DECOR_IMG) can be initialized (e.g., set to 0). At step 1003, the original-image publisher in the Video Adapter 703 can be polled. The poll time can be set at any target value (e.g., equal to 1 millisecond or some other designated value). At step 1005, the image buffer 705 checks for any received messages during the poll time. If a message with an image is received within the poll time, the image is pushed into the buffer 705 together with the corresponding metadata (at step 1007).
If no message is received within the poll time, the process continues to step 1009 where a tracking-information publisher (e.g., the tracking information aggregation process 111 or other equivalent function/process) is polled. The poll time at this step can also be set to any value (e.g., 1 millisecond or some other designated value). Step 1011 then checks for any received messages. If a message with tracking information is received within the poll time, the tracking information (Merged Detections 229) is attached to the corresponding image in the buffer (at step 1013).
If no message is received within the poll time, at step 1015, the number of decorated images (DECOR_IMG) in the buffer is checked against the buffer occupancy (BO) which is the number or original images currently in the buffer. If the number of decorated images is less than BO, the process continues to step 1017. At step 1017, the process obtains a first image from the buffer and decorate the image with relevant information (e.g., detection and tracking information). In some examples, the relevant information includes debugging information that describes internal state variables used by the system 100 for visual object tracking. In this way, the internal state variables are visualized in the decorated image to provide information for debugging or optimizing object tracking processes.
At step 1019, the decorated image can be published (by Decorated Image Publisher 709 in
In one example, after the image buffer 705 enters the “Normal” state 905, it is operating as shown in process 1100 of
At step 1105, the original-image publisher can be polled. The poll time can be set at any target value (e.g., equal to 1 millisecond or some other designated value depending on a desired process function). At step 1107, the image buffer 705 checks for any received messages during the poll time. If a message with an image is received within the poll time, the image is pushed into the buffer together with the corresponding metadata (at step 1109).
If no message is received with the poll time, the process proceeds to step 1111 “Buffer management.” For example, at step 1111, the required buffer occupancy (req_BO) is adjusted according to an actual pipeline processing delay if needed. In one example, the actual pipeline processing delay is expected to be smaller than ALLOWED_LATENCY in most cases. If not, adaptive “Buffer management” is used to deal with the issue caused by slower pipeline processing. The details of the step “Buffer management” are provided in process 1200 of
At step 1113, the tracking-information publisher is polled. The poll time at this step can be set to any value (e.g., 1 millisecond or some other designated value). Step 1115 then checks for any received messages. If a message with tracking information is received within the poll time, the tracking information is attached to the corresponding image in the buffer (at step 1117).
At step 1119, whether the flow is lagging behind is checked. For example, if the elapsed time exceeds 1/TARGET_FPS since the last decorated image was generated, the image flow is deemed to be lagging behind. If the flow is lagging behind, the buffer occupancy (BO indicating the number image in the buffer) is greater than the required buffer occupancy (req_BO). If yes, at step 1123, the front image from the buffer is popped (i.e., removed from the buffer) and decorated with the attached tracking information. In other words, a decorated image is generated and then removed from the buffer. If no tracking information is available, debugging information as described in the example above, can be attached in addition to or instead of the tracking information. At step 1125, the decorated image is published on the reserved TCP port associated with the publisher.
As described in the various examples of the image buffer 705 above, the outputs from DNNs, e.g. data messages, (or any other equivalent object/feature detector) are published on a reserved TCP port or equivalent output stream. Therefore, the tracking algorithms can subscribe to the data messages and receive the results coming from the previous stages in the pipeline (e.g., the real-time object tracking system 101). In one example, the inputs to the unit of tracking algorithms (e.g., tracking algorithms 407 of
In step 1301, the current frame and detection results (e.g., generated by a DNN-based object tracker) are received, e.g., in an outer tracker 103. If the detection result is not empty, this current frame is an annotated frame. Otherwise, the current frame is a called a plain frame.
In step 1303, the target objects to track are selected. In one example, the target objects are those objects that the applied DNN-based object tracker has been trained to detect (or a subset thereof). The selected target objects are included in a list TARGET_OBJECTS. If the list TARGET_OBJECTS is not empty, then active symbols that are not in the list can be stopped.
In step 1305, the inner trackers 105 (1-k) are updated on the current image. In one example, the inner trackers 105 are based, for example, on ROI trackers or equivalent algorithms. Then, the detection results (e.g., bounding boxes) in the current frame are used to initiate or update the trackers based on the ROIs indicated by the bounding boxes of the current frame.
In step 1307, the outer tracker 103 is used to perform data association on currently tracked objects to track them from frame to frame. In other words, given the outputs of the deep neural networks, the outer tracker 103 can do data association. That is to say, if an object is detected in two consecutive frames, the outer tracker 103 will form the correspondence of the object between the current frame and the previous frame using, for instance, location prediction (e.g., via Kalman filtering based on motion model) and/or the object's appearance features which are provided by deep-learning model.
More specifically, motion estimation forecasts the locations of the objects in the subsequent frame and facilitates data association between the frames. Successful data associations can also be dependent on the appearance features of the objects, for example color histograms. Furthermore, the appearance features and the color histograms are used for re-tracking of the objects in the event of loss of tracking due to occlusion or the object leaving or re-entering the video frames.
In step 1309, the one or more inner trackers 105 (1-k) can be re-initiated using the latest detection (e.g., bounding box) resulting from the outer tracker 103. In other words, if an actively tracked symbol is updated using the latest detection, the inner tracker 103 is re-initiated (e.g., re-trained) based on the current image and the bounding box.
In step 1311, the detection results from the inner trackers 103 can be merged (e.g., via the inner tracking management function 109 or equivalent function/process). For example, multiple active symbols, that are driven by inner tracking/trackers, can be merged if they are deemed to belong to the same target object. In one example, if the one or more deep neural networks (e.g., associated with outer tracker 103) fail to detect the objects in the current frame and subsequent frames due to the detectability limitation or the mismatch between achievable FPS and target FPS (when this happens, some of frames will be skipped for processing by the deep networks), the data association may be broken, or the objects may be mis-matched due to the false negative detection. The inner trackers 105 are introduced to solve the technical problem in the situation when a missed detection happens, or some frames are skipped without being processed.
In one example, an inner tracker 105 can be initiated (e.g., by the inner tracking management function 109 or equivalent function/process) with the ROI to be tracked. The ROI, for instance, is provided by the outer tracker 103. The various examples of the inner tracking mechanism described herein maintain a bank/list of inner trackers 105 (e.g., ROI trackers) each corresponding to an object being tracked (e.g., an active symbol). Based on the outputs of the outer tracker 103 for the current frame, the inner tracking algorithm re-trains (i.e. re-initiate) the ROI tracker for each object being tracked. The training of ROI tracker is relatively fast. In the presence of multiple ROI trackers, the training can be done in multiple threads on multiple cores if the CPU resources are sufficient.
Upon reception of the current frame, the inner tracking management function 109 or equivalent function/process will update the ROI trackers (e.g., inner trackers 105) using the image. As previously described, the outer tracker 103 will do data association based on the current detection result from the deep networks if any (note: the deep neural networks may not have time to process the current image depending on the available GPU/computing resources and the actual inference time) as well as the states of the inner trackers 105 (e.g., one inner tracker 105 for each target object being tracked).
In one example, to reduce processing time, the re-training/re-initiating or update of an inner tracker 105 (e.g., ROI tracker) can be done on a down-sized image or a cropped sub-image instead of the original full-size image, particularly when the object size is relatively large. The cropping of the images can be based on the location prediction provided by a motion model for each object being tracked (e.g., active symbol). In other words, the image can be cropped so that the cropped portion of the image is expected to depict the target object as it moves frame to frame. In one example, after a successful re-training of the one or more inner trackers 105, the system can re-align and/or re-size the estimated bounding box to the original full-size image.
In one example, a lifetime of an inner tracker 105 is limited. As use herein the lifetime of the inner tracker 105 refers to time or number of frames from the initiation of the inner tracker 105 by an outer tracking result to rei-initiation of the inner tracker 105 by the next outer tracking result. In the case of perfect outer tracking (e.g., where the outer tracker 103 does not miss any frames of the video), the lifetime of inner tracker is 1 frame. For an object being tracked, its corresponding inner tracker 105 starts upon the successful association of the current frame with the previous frame for the object. If the deep network fails to detect the object in the subsequent frames, the inner tracker 105 may be going on to estimate the object's bounding box and trajectory until the object gets detected again and successfully associated with previous trajectory.
In step 1313, the tracking information (e.g., from both the outer tracker 103 and inner trackers 105) can be collected for the current frame (e.g., annotated or plain frame) by the tracking information aggregation process 111 or equivalent function/process.
In various examples, the DNN-based object detection service used in the various example embodiments described herein can be implemented as shown in
In one example, the stage of object detection is followed by feature extraction, for example, in the process 107.
At step 1505, the real-time object tracking system 101 determines whether data (e.g., detection results or other data related to detection results) arrives. If not, the process returns to step 1503 to continue listening for data detection results. Otherwise, the process continues to step 1507.
At step 1507, the real-time object tracking system 101 determines whether the received data includes any available detection result. For example, when the previous stage has no time to process the image, the original image skips the DNN model and is simply forwarded with no detection results. If a detection result is available in the received data, the process continues to step 1509.
At step 1509, a DNN inference is performed to calculate embeddings (e.g., vectorized features) for detected instances of target objects included in the received data. In one example, for parallelism of processes, such as the 1401, 1403 and 1405 (to increase speed of processing), cropped images of the detected objects are feed in a batch to the DNN model. Embeddings, for instance, refers to a high-dimensional real-valued vector to represent the extracted feature associated with the detected instances of the target objects.
At step 1511, the embeddings are added as part of the detection result for the image. In one example, the detection results and the original image can be wrapped in a JSON formatted message (or any other equivalent message format).
At step 1513, the real-time object tracking system 101 forwards the original image together with detection results if any and metadata to the next stage of the pipeline. For example, if at step 1507 no detection result is available, then no detection result will be forwarded at this step. In one example, the message can then be published on a TCP port reserved for the publish/subscribe connection (or using any other equivalent communication means to transmit the message).
In summary, in one example, the transitions between the various states of
When a plain frame is received, e.g., in a signal 207, without detection result available, the real-time object tracking system 101 uses the current image to update the inner tracker of each active symbol (i.e., each object being tracked).
In various examples, when an instance is deemed to be a reliable detection and not be associated with any active symbols, it will be assigned unique ID and becomes an active symbol in a Tracked sub-state, as shown in
At step 1807, the real-time object tracking system 101 determines whether a detection confidence level of the given instance is greater than a threshold confidence H1. For example, to initialize an Active symbol, the instance's confidence level should be greater than a threshold to reduce the effect of false positive detections. If the confidence level is not greater than H1, the process 1800 returns to the originating process (at step 1809). If the confidence level is greater than H1 for the given instance, the process may continue to step 1811.
At optional step 1811, if one or more keypoint detections are available (e.g., by a keypoint detector 605), the real-time object tracking system may filter out some low-quality or unreliable detections (e.g., reflections in windows or glass walls). The filtering can be performed by determining whether an average keypoint confidence score (e.g., average of all detected keypoints of a target object) is above a second threshold confidence H2. If the average keypoint confidence score is not greater than H2, the process 1800 returns to the originating process (at step 1813). If the average keypoint confidence score is greater than H2 for the given instance, the process continues to step 1815.
At step 1815, the given instance is assigned a unique ID. Then at step 1817, a symbol is created and comes into the Active Tracked state.
At step 1905, a proper of type of ROI tracker (e.g., inner tracker 105) is selected based on the number of active symbols and/or by considering computation requirements/availability of the tracking system/process 101. For example, available ROI trackers can include but are not limited to Channel and Spatial Reliability Tracker (CSRT), Kernelized Correlation Filters (KCF), and/or equivalent. In terms of detection performance, the CSRT is generally better than the KCF. In terms of detection speed, the CSRT is generally slower than the KCF. Therefore, selecting the CSRT versus the KCF is a matter of balancing performance (e.g., detection accuracy/confidence) against processing speed.
At step 1907, the ROI tracker (e.g., inner tracker 105) is initiated based on the detection result (e.g., bounding box predicted by the DNN model) and the original image. In one example, the image may be resized (e.g., by half or other factor) to accelerate the ROI model initiation (or training) process, particularly for the CSRT.
At step 1909, the real-time object tracking system 101 determines whether the initiation of the inner tracker 105 is successful. If not successful, the process 1900 fails and returns (at step 1911). If successful, the inner tracker 105 is added to the active symbol (e.g., tracklet). In other words, the inner tracker 105 becomes part of the tracklet if the initiation is successful.
Before describing the implementation details of the outer-inner tracking algorithm, the selection of target objects is discussed. The various example of the real-time object tracking system 101 described herein are able to deal with multiple target objects. However, in some use cases, the real-time object tracking system 101 may be interested in a single target object or a subset of the detected objects, e.g., selected/defined by a user/operator of the system 101. In one example, if such a list of target objects is selected and published to the outer tracker 103, it will be stored in an internal variable TARGET_OBJECTS as shown in
At step 2003, a message queue is monitored for commands related to management of the list of target objects. In one example, the commands are emitted from a dashboard in the control plane. At step 2005, the real-time object tracking system 101 determines whether a command is received based on the monitoring. If no command is received, the process returns to step 2003 to continue monitoring.
At step 2007, if a command is received, the real-time object tracking system 101 determines whether a selection of one or more target objects is specified in the command. In one example, the one or more selected target objects is specified by a list of corresponding tracking IDs. As described above, if the list is empty, then all detected objects are tracked. Otherwise, only the specified target objects are tracked. If the command is a selection command, then the list of selected target objects is updated based on the command (step 2009). The process can then return to step 2003 to monitor for additional commands.
At step 2011, if the command is not a selection command, the real-time object tracking system 101 determines if the command specifies a deselection of target objects. If the command is a deselection command, the list of target objects is emptied of the selected target objects (step 2013). The process can then return to step 2003 to monitor for additional commands.
At step 2101 of
The updating of the inner trackers 105 (e.g., one inner tracker 105 for each target object) generally is executed fairly fast/faster compared to a required frame rate (TARGET_FPS). In one example, to accelerate the updating of the inner trackers 105, the real-time object tracking system 101 can also re-size the original image before re-training the related one or more inner trackers 105. In multi-object cases, multithreading may be used to speed up the process if parallelism is supported. If an inner tracker 105 is successfully updated based on the current frame, a new estimate of a bounding box will be returned. Otherwise, the inner tracker 105 will be stopped and disabled. If an inner tracker 105 is successfully updated upon a plain frame, a bounding box estimate will be appended to the active symbol's trajectory and also will be used to drive the symbol's motion model. At the same time, the inner tracker's stopwatch will be also incremented by 1. The stopwatch is a counter defined when the inner tracker 105 was initiated. The stopwatch is reset when the inner tracker 105 is re-initiated using the bounding box inferred by the DNN-based object detector on the current image.
At step 2105, the bounding boxes estimated by the inner trackers are tested. For example, even if an inner tracker 105 is successfully updated, the real-time object tracking system 101 can test if the estimated bounding box will give correct information.
The real-time object tracking system 101 then calculates a distance (e.g. difference or similarity) between the embedding of the cropped image and the embeddings stored in the tracklet or symbol (at step 2213). If a similarity distance is also less than a pre-defined threshold, the bounding box estimate is deemed to pass the test (at steps 2215 and 2217). In addition, the real-time object tracking system 101 will also calculate a color histogram of the cropped image and use a distance between the current and previous color histograms as a color similarity metric to determine a validity of the current bounding box estimate (at steps 2219 and 2221). The real-time object tracking system 101 can use different criteria in the test of the bounding box estimated by the inner tracker 105 (at step 2215). If the bounding box fails the test, the inner tracker 105 will be stopped. If the bounding box estimate passes the test, the corresponding embedding and color histogram is stored in the symbol for later use (at step 2217).
Note that the testing of bounding box estimates can be done intelligently. In one example, the sizes of the bounding boxes estimated by ROI trackers may have large variations from frame to frame. For example, if a current size of the bounding box becomes smaller than the one in the previous frame, the real-time object tracking system 101 can crop out a larger portion of the image at the center of the current bounding box for image classification. If a Kalman filter (e.g., Unscented Kalman Filter (UKF)) is used for motion prediction including bounding boxes, a smoothed bounding box estimate may be employed in the testing process.
Returning to step 2107 of
At step 2111, the real-time object tracking system 101 associates Active|Tracked symbols to instances.
It is contemplated any data association algorithm can be used according to the examples described herein. For example, the real-time object tracking system 101 can use any efficient and effective method for data association (e.g., the Hungarian algorithm for combinatorial optimization problems).
In one example, the real-time object tracking system 101 provides a greedy algorithm to show how data association works in the outer tracking loop. The greedy algorithm may lead to sub-optimal performance with less computational efforts than a globally optimal solution.
For example, at step 2401 of
At step 2407, given a pair (S_k, I_m), a predicted bounding box (bbox_pred) for S_k and a detected bounding box (bbox_det) for I_m in a current frame is obtained. In one example, if instance segmentation is performed during the DNN inference for object detection, binary masks will be provided for S_k and I_m, and the binary masks can be used to get the foremost contours of the tracked object and instance (step 2407). At step 2409, the Intersection-over-Union (IoU) between an active symbol (S_k) and an instance (I_m) can be calculated based on their bounding boxes and/or contours determined from instance segmentation.
At step 2411, the real-time object tracking system 101 determines whether the calculated IoU is greater than 0 (i.e., indicating that there is some overlap between the bounding boxes and/or contours of S__m and I_m). If the IoU is not greater than 0 (i.e., no overlap), the process returns to step 2405 to evaluate the next pair in the list of pairs. If the IoU is greater than 0, the real-time object tracking system 101 determines whether the IoU is greater than a pre-defined threshold (H3) (e.g., H3=0.05 or any other designated value) (at step 2413). In one example, the pre-defined threshold H3 can be determined separately for bounding boxes and contours. If the IoU is greater than the pre-defined threshold H3, the pair (S_k, I_m) is appended to list_pairs_dist together with a calculated distance (step 2415). If the IoU is not greater than the pre-defined threshold H3, the pair (S_k, I_m) is appended to list_pairs_iou together with the IoU. This results in populating the two lists (list_pairs_dist and list_pairs_iou) with respective pairs (S_k, and I_m).
At step 2419, the real-time object tracking system 101 determines whether to end the loop of processing pairs in the list of pairs. For example, the loop ends when all pairs or a threshold number of pairs in the list of pairs have been processed. If the processing loop is not ended, the real-time object tracking system 101 returns to step 2405 to process the next pair in the list. At step 2421, if the process loop ends, the real-time object tracking system 101 sorts the two lists (list_pair_dist and list_pair_iou) for output. For example, the list_pairs_iou is sorted in terms of IoU in decreasing order, and the list_pairs_dist is sorted in terms of distance in increasing order.
In the process 2500 of
At step 2507, the real-time object tracking system 101 calculates a minimum appearance similarity distance D_min between the instance I and the active symbol S. In one example, an active symbol S (e.g., tracklet) may contain multiple embeddings from previous frames and its candidate instance I will have a single embedding. D_min is then, for instance, calculated as the shortest distance in the embedding space. For example, the real-time object tracking system 101 assumes that the embeddings associated with S and I have been normalized (e.g., each vector's length is unit 1). Let Emb_n, n=1, 2, . . . , L be the embeddings stored in the active symbol S, and Emb0 the instance I's embedding, respectively. The Euclidean distance between the embeddings Emb_n and Emb0, D_n, is calculated, n=1, 2, . . . , L. The similarity distance between the symbol and instance is min {D_n, n=1, 2, . . . , L}. In one example, a tracklet may store the embeddings from the most recent frames over a limited time window to control computational demands.
At step 2509, the real-time object tracking system 101 determines whether D_min for the pair is below a designated threshold H4. By way of one example, the threshold H4 is determined based on a Re-ID model in use (e.g., H4=0.5 or any other designated value). If D_min is not less than H4, then the process 2500 returns to step 2503 to process another pair in the list. At step 2511, if D_min is less than H4, then the active symbol S is successfully associated to the instance I in the pair. The real-time object tracking system 101 then adds the instance I to the active symbol S by appending the instance I to the symbol S (e.g., bounding box, embedding, etc.) and marks the active symbol S and instance I as “associated.”
At step 2513, the real-time object tracking system 101 determines whether the loop for processing the list_pairs_iou should end (e.g., based on all or a threshold number of pairs in the list having been processed). If the loop is not ended, the process 2500 returns to step 2503 to process another pair in the list_pairs_iou. If the loop is ended, the process continues to step 2515 in
At step 2515, the real-time object tracking system 101 next iterates over the pairs (S, I) in the list_pairs_dist by performing the steps described as follows. At step 2517, the real-time object tracking system 101 determines whether a given pair (S, I) is eligible for processing. As described with respect to step 2505, in one example, the eligibility is determined by evaluating whether symbol S of the pair is not yet associated to any instance and instance I of the pair is not yet associated with any symbol. If either of S or I is already associated, the pair is not eligible, and the process 2500 returns to step 2515 to process another pair in the list_pairs_dist. If eligible, the process 2500 continues to step 2519.
At step 2519, the real-time object tracking system 101 calculates the shape similarity IoU (IoU_Shape) between S and I. By way of one example, for two bounding boxes, the shape similarity IoU is defined as the IoU between the two boxes when their upper-left corners are aligned, alternatively, any other corners can be also used. In other words, the left upper corners of the bbox_pred of the active symbol S and the bbox_det of the instance I can be aligned and then the IoU calculated following the alignment. At step 2521, the real-time object tracking system 101 determines whether the calculated IoU_Shape is greater than a pre-defined threshold H5 (e.g., 0.35 or any other designated value). If the IoU_Shape is not greater than H5 then the process 2500 returns to step 2515 to process another pair in the list_pairs_dist.
At step 2523, if the IoU_Shape is greater than H5, then the real-time object tracking system 101 calculates a minimum appearance similarity distance D_min between the instance I and the active symbol S as, for example, in step 2507. In one example, an active symbol S (e.g., tracklet) may contain multiple embeddings from previous frames and its candidate instance I will have a single embedding. D_min is then, for instance, calculated as the shortest distance in the embedding space. For example, the real-time object tracking system 101 assumes that the embeddings associated with S and I have been normalized (e.g., each vector's length is unit 1). Let Emb_n, n=1, 2, . . . , L be the embeddings stored in the active symbol S, and Emb0 the instance I's embedding, respectively. The Euclidean distance between the embeddings Emb_n and Emb0, D_n, is calculated, n=1, 2, . . . , L. The similarity distance between the symbol and instance is min {D_n, n=1, 2, . . . , L}. In one example, a tracklet may store the embeddings from the most recent frames over a limited time window to control computational demands.
At step 2525, the real-time object tracking system 101 determines whether D_min for the pair is below a designated threshold H6. By way of example, the threshold H6 is determined based on a Re-ID model in use (e.g., H6=0.5 or any other designated value). If D_min is not less than H6, then the process 2500 returns to step 2515 to process another pair in the list. At step 2527, if D_min is less than H6, then the active symbol S is successfully associated to the instance I in the pair. The real-time object tracking system 101 then adds the instance I to the active symbol S by appending the instance I to the symbol S (e.g., bounding box, embedding, etc.) and marks the active symbol S and instance I as “associated.”
At step 2529, the real-time object tracking system 101 determines whether the loop for processing the list_pairs_dist should end (e.g., based on all or a threshold number of pairs in the list having been processed). If the loop is not ended, the process 2500 returns to step 2515 to process another pair in the list_pairs_iou. If the loop is ended, the process continues to step 2531 in
At step 2531 of
At step 2535, the real-time object tracking system 101 determines whether the IoU of the S and I in the pair is greater than a pre-defined threshold H7 (e.g., H7=0.45 or any other designated value). If IoU is not greater than H7, then the process 2500 returns to step 2531 to process another pair in the list. At step 2537, if IoU is greater than H7, then the active symbol S is successfully associated to the instance I in the pair. The real-time object tracking system 101 then adds the instance I to the active symbol S by appending the instance I to the symbol S (e.g., bounding box, embedding, etc.) and marks the active symbol Sand instance I as “associated.” In one example, the real-time object tracking system 101 can use spatial and/or temporal constraints on the trajectory of the tracked object rather than appearance features to associate S and I in a given pair.
At step 2539, the real-time object tracking system 101 determines whether the loop for processing the list_pairs_iou should end (e.g., based on all or a threshold number of pairs in the list having been processed). If the loop is not ended, the process 2500 returns to step 2531 to process another pair in the list_pairs_iou. If the loop is ended, the process ends at step 2541.
Note that in the various example described above, if a list of target objects, TARGET_OBJECTS, is defined and it is not empty, the active symbols not in the list will be stopped and put into the inactive state.
Returning to steps 2111-2117 of
In step 2601, the real-time object tracking system 101 combines active symbols S that are in a Fragmented sub-state (list_active_fragmented_symbols) with candidate instances I that are not yet associated (list_instances=[instances not yet associated]) to generate possible combination of pairs (S, I) in a list of pairs (list_pairs=[ ]). For a given pair (S, I) (at step 2603), the real-time object tracking system 101 calculates an IoU between the most recent bounding box (bbox_last) stored in the active symbol S and the bounding box (bbox_det) of the instance I detected in a current frame (at step 2605). In one example, if instance segmentation is performed, the contours of the detected object in the S and I can be used in place of or in addition to their respective bounding boxes.
At step 2607, the real-time object tracking system 101 determines whether the calculated IoU is greater than 0 (i.e., indicating that there is some overlap between the bounding boxes and/or contours of S_m and I_m). If the IoU is not greater than 0 (i.e., no overlap), the process returns to step 2603 to evaluate the next pair in the list of pairs. If the IoU is greater than 0, the real-time object tracking system 101, the real-time object tracking system 101 calculates the minimum similarity distance D_min between the symbol S and instance I in terms of appearances features (e.g., encoded in an embedding—i.e., a high dimensional vector of encoding the appearance features) (at step 2609). The pair of active symbol S and candidate instance I along with the calculated D_min are also appended to the list_pairs.
At step 2611, the real-time object tracking system 101 determines whether the loop for processing the list of pairs should end (e.g., based on all or a threshold number of pairs in the list having been processed). If the loop is not ended, the process 2600 returns to step 2603 to process another pair in the list of pairs. If the loop is ended, the process continues to step 2613.
At step 2613, the list_pair generated above is sorted in terms of similarity distance (e.g., in increasing order so that the most similar instance goes first in the list). Then at step 2615, the real-time object tracking system 101 iterates over list_pairs such that for a given pair (S, I), the following steps are performed.
At step 2617, the real-time object tracking system 101 determines whether a given pair (S, I) is eligible for processing. In one example, the eligibility is determined by evaluating whether symbol S of the pair is not yet associated to any instance and instance I of the pair is not yet associated with any symbol. If either of S or I is already associated, the pair is not eligible, and the process 2600 returns to step 2615 to process another pair in the list_pairs. If eligible, the process 2600 continues to step 2619.
At step 2619, the real-time object tracking system 101 calculates the shape similarity IoU (IoU_Shape) between bbox_last of S and bbox_det of I. By way of example, for two bounding boxes, the shape similarity IoU is defined as the IoU between the two boxes when their upper-left corners are aligned; alternatively, any other corners can be also used. In other words, the left upper corners of the bbox_last of the active symbol S and the bbox_det of the instance I can be aligned and then the IoU calculated following the alignment.
At step 2621, the real-time object tracking system 101 determines whether the calculated IoU_Shape is greater than a pre-defined threshold H8 (e.g., 0.45 or any other designated value). If IoU_Shape is not greater than H8 then the process 2600 returns to step 2615 to process another pair in the list_pairs. If IoU_Shape is greater than H8, then the real-time object tracking system 101 determines whether D_min (e.g., as calculated at step 2609) is less than a predefined threshold H9. By way of example, the threshold H9 is determined based on a Re-ID model in use (e.g., H9=0.5 or any other designated value). If D_min is not less than H9, then the process 2600 returns to step 2615 to process another pair in the list. At step 2623, if D_min is less than H9, then the active symbol S is successfully associated to the instance I in the pair. Then at step 2625, The real-time object tracking system 101 then adds the instance I to the active symbol S by appending the instance I to the symbol S (e.g., bounding box, embedding, etc.) and marks the active symbol S and instance I as “associated.”
At step 2627, the real-time object tracking system 101 determines whether the loop for processing the list_pairs should end (e.g., based on all or a threshold number of pairs in the list having been processed). If the loop is not ended, the process 2600 returns to step 2615 to process another pair in the list_pairs. If the loop is ended, the process ends and returns at step 2629.
Returning to step 2119 in
Note that if a list of target objects, TARGET_OBJECTS, is defined, and it is not empty, the real-time object tracking system 101 will not initiate any new symbol.
In step 2121, the real-time object tracking system 101 re-initiates inner trackers 105 for active symbols. For example, if an active symbol has been successfully associated to an instance detected in the current frame, the old inner tracker 105 (e.g., ROI tracker) will be discarded. According to the number of active symbols and the computational resources available, the type of inner tracker 105 (e.g., ROI tracker) is chosen and a new inner tracker 105 of this type is created. The new inner tracker 105 is initiated using the new bounding box and the current image. If the inner tracker 105 is successfully initiated, it will be added to the active symbol (e.g., tracklet). Otherwise, no inner tracker 105 will be attached to the active symbol. In one example, every inner tracker 105 has a stopwatch which is reset when the inner tracker is re-initiated. The stopwatch will be incremented by 1 if the inner tracker 105 is successfully updated on the current image.
In step 2123, the real-time object tracking system 101 update symbol states. For example, based on the results of data association and initiation/re-initiation for inner trackers 105, the states of the active symbols will be updated as shown in
In one example, for a plain frame, if an active symbol's inner tracker is successfully updated on the current image, and the bounding box estimate passes the test, the bounding box, and the feature representations (e.g., embedding and color histogram generated on the cropped image) will be appended to the symbol (e.g., tracklet).
In one example, for an annotated frame, if an active symbol does not connect to any instance detected in the current frame but its inner tracker is successfully updated, and the bounding box estimate passes the test, the bounding box, and the feature representations (embedding and histogram) will be also appended to the active symbol.
At step 2125, the real-time object tracking system 101 updates the motion model for each active symbol. For example, after an active symbol is successfully associated to an instance in the current frame, the symbol's motion model will be updated using the new bounding box. In the case of plain frame without detection result, the inner trackers 105 of the active symbols will be updated using the current image (at step 2127). If an active symbol's inner tracker 105 is successfully updated, the bounding box estimate can be used to update the motion model (at step 2129).
At step 2131, the real-time object tracking system 101 manages inner trackers and update symbol states. For example, in the case of false negative errors, the DNN model used for object detection may fail to detect the target objects in many consecutive frames and the real-time object tracking system 101 will rely on inner tracking to keep track of the objects. To find out whether or not an inner tracker 105 is drifting away from the target object, the real-time object tracking system 101 tests the bounding box estimated by the inner tracker 105 as shown in
According to various example embodiments, the flowchart 2700 of
At step 2709, the real-time object tracking system 101 determines whether the loop for determining inner trackers at large should end (e.g., based on process all or a threshold number of active symbols in a Tracked sub-state). If the loop is not ended, the process returns to step 2703 to process another active symbol. Otherwise, the process 2700 proceeds to step 2711.
At step 2711, if the list_inner_trackers_at_large is not empty, the real-time object tracking system 101 iterates over the list_inner_trackers_at_large to determine whether any two symbols (S_i, S_j) in the list can be combined because they are tracking the same object. At step 2713, for a given pair of symbols S_i and S_j in the list, an IoU between S_i and S_j is calculated using, for instance, their respective latest bounding boxes (or contours if instance segmentation is performed). At step 2715, the real-time object tracking system 101 determines whether the calculated IoU is greater than a pre-defined threshold H10 (e.g., 0.5 or any other designated value). If the IoU is not greater than H10, the process 2700 returns to 2711 to process another pair of symbols. If the IoU is greater than H10, at step 2717, the combination of S_i and S_j is appended to the list_comb_inner_trackers_at_large together with the IoU indicating that symbols S_i and S_j may belong to the same object.
At step 2719, the loop iterating over the list_inner_trackers_at_large can be ended when all or a threshold number of eligible combinations of symbols are evaluated. If additional combinations remain, then the process 2700 returns to step 2711 to evaluate the next pair of possible symbols to combine. If not, the loop ends, and the process 2700 continues to step 2721 in
At step 2721 in the
At step 2729, both S_i and S_j removed from the list are checked to determine whether they are still in active and in the Tracked sub-state, e.g., in case the symbol(s) may have been put into Active|Fragmented state. If either or both of the symbols are no longer Active|Tracked, the process returns step 2723 to evaluate the next combination in the list (if any). If both are still in the Active|Tracked state, the real-time object tracking system 101 determines which symbol's inner tracker was started earlier by checking whether S_i's stopwatch is greater than S_j's stopwatch. If S_i's stopwatch is greater than S_j's stopwatch, then S_i's inner tracker is disabled and its TIMER_FRAG is started. If S_i's stopwatch is not greater than S_j's stopwatch, then S_j's inner tracker is disabled and its TIMER_FRAG is started. In other words, the real-time object tracking system 101 keeps the later started inner tracker and discards the earlier started one according to one example. The process 2700 then returns to step 2723 to process additional combinations of symbols (if any).
Returning to step 2133 in
In summary, from the above descriptions, the real-time object tracking system 101 provides for at least the following features:
In addition, in one example, the real-time object tracking system 101 can include a pixel-based motion algorithm or equivalent algorithm to track objects continuously in the subsequent frames during occlusion or object out of frame and reappearing. For example, the pixel-based motion algorithm or equivalent is used for object detection when the DNN is not available. The various examples of the outer-inner tracking mechanism are still applicable in this case.
At steps 2801 and 2803, a region of interest or an object of interest of certain size within an expected range of our algorithm is defined by the user or automatically detected by the object detector in a first frame and second frame. Once this is obtained, within the region of interest, the real-time object tracking system 101 detects and extracts a foreground and background model by using consecutive frames (steps 2805 and 2807). In step 2805, the system 101 extract the foreground by differentiating background pixels and foreground pixels. Since a camera and scene both are in motion, the real-time object tracking system 101 can obtain noise in a frame differencing. The real-time object tracking system 101 apply a Euclidean clustering method or equivalent clustering method to find clusters of points in resulting pixels and to minimize noise in next frames (step 2809). A centroid of a largest cluster is chosen as a maximum motion region. This is assigned as the foreground. Using the foreground pixels properties, the real-time object tracking system 101 extract feature points and use the feature points to track the object in subsequent frames (step 2811). At step 2813, these features are then passed to the outer and inner tracking algorithm as described in the above.
In addition to the above, the real-time object tracking system 101 can provide an architecture to enable real-time object tracking module according to the various examples described herein for a moving camera, such as implemented on a handheld device, a mobile communication device, a vehicle, an unmanned aerial vehicle (UAV), a drone or a robot.
When a frame 2905 is received by an object tracking module client 2907 from the camera 2901, e.g. from the UAV based camera, it sends it to a Network Latency Analyzer 2909 of the client 2907 to calculate a round trip time (RTT) to process the frame with a return of a location of the object (e.g. bounding box coordinates) in the frame. If the RTT 2911 is less or equal to a desired time T, the process pipeline 2900 switches to the module on the on-board processor 2903 (e.g., on UAV (unmanned aerial vehicle) or device side). If the RTT 2913 is greater than a desired time T, the process pipeline 2900 switches the Cloud 2901 for further processing of the subsequent frames for an interval I. For each interval I, the Network Latency Analyzer 2909 recalculates the RTT 2911 and/or 2913 for the current frame and picks one of the two pipelines to process the subsequent I frames. The LiteOTM 2915 is the lightweight object detector and tracking system compatible on Edge or onboard processor with limited resources (e.g., 1 GPU) in comparison to heavier weight (e.g., more than 1 GPU) resources on the OTM module 2917 of the cloud 2901. In one example, the real-time object tracking system 101 can also enable a modular architecture pipeline to enable customization based on use-cases. For example, the object detector can be swapped for a customized trained model, e.g., for a specific task and/or a specific object type, for the inner trackers 105, outer tracker 103, and/or DNN-based object detection & feature extraction 107.
It is contemplated that various of the real-time object tracking 101 described herein can be used for any application where such tracking can be used. For example, one such example use case is a drone-based track and follow function.
In another use case, the real-time object tracking system 101 can be used for vehicle-to-infrastructure (V2X) communication manner, e.g. for smart intersection alert for public safety. In a similar manner, the system 101 can be implemented between one or more vehicles over vehicle-to-vehicle (V2V) communication manner with cameras moving with the vehicles. Example steps of this use case is provided below:
In another use case, the real-time object tracking system 101 can be used for in-vehicle object tracking. Most traditional object tracking systems are currently being deployed to monitor indoor or outdoor areas. But multi object and person tracking is also very useful in a vehicle itself, especially in modern autonomous or smart electric vehicle by the Original Equipment Manufacturer (OEM) or after-market smart camera with object tracking capabilities installed in older vehicles. Example steps of this use case is provided below:
Returning to
In one example, the UEs include one or more device sensors (e.g., a front facing camera, a rear facing camera, digital image sensors, LiDAR (light detection and ranging) sensor, global positioning system (GPS) sensors, sound sensors, microphones, height or elevation sensors, accelerometers, tilt sensors, moisture/humidity sensors, pressure sensors, temperature sensor, barometer, NFC sensors, wireless network sensors, etc.) and clients (e.g., mapping applications, navigation applications, image processing applications, augmented reality applications, image/video application, modeling application, communication applications, etc.). In one example, GPS sensors can enable the UEs to obtain geographic coordinates from one or more satellites for determining current or live location and time. Further, a user location within an area may be determined by a triangulation system such as A-GPS (Assisted-GPS), Cell of Origin, or other location extrapolation technologies when cellular or network signals are available.
In one example, the real-time object tracking system 101 can perform functions related to real-time object tracking as discussed with respect to the various examples described herein. In one instance, the real-time object tracking system 101 can be a standalone server or a component of another device with connectivity to the communications network. For example, the component can be part of an edge computing network where remote computing devices are installed within proximity of a geographic area of interest, one or more assets (e.g., utility company assets), or a combination thereof.
In one instance, the DNN-based object detection & feature extraction 107 of the real-time object tracking system 101 can include one or more neural networks or other machine learning algorithms/systems to process frames of an input (e.g., a video stream or multiple static images, or serial or satellite imagery) (e.g., using an image segmentation algorithm) to generate labels for pixels of the input images. In one instance, the neural network of the DNN-based object detection & feature extraction 107 is a traditional convolutional neural network (CNN) which consists of multiple layers of collections of one or more neurons (which are configured to process a portion of an input data).
In one example, the real-time object tracking system 101 has connectivity over the communications network to a services platform that provides one or more services that can use the tracking output (e.g. a tracking information 409) of the real-time object tracking system 101. By way of example, the one or more services may also include mapping services, navigation services, emergency response services, notification services, social networking services, content (e.g., audio, video, images, etc.) provisioning services, application services, storage services, contextual information determination services, augmented reality service, location-based services, information-based services (e.g., weather, news, etc.), etc. or any combination thereof.
In one example, one or more cameras, IoT devices, drones, and/or UEs may be configured with various sensors for acquiring and/or generating sensor data for real-time. For example, the sensors can capture one or more images of a geographic area and/or any other sensor data (e.g., LiDAR point clouds, infrared scans, radar scans, etc.) that can be used for real-time object tracking according to the examples described herein.
In one example, the components of the real-time object tracking system 101 may communication over a communications network that includes one or more networks such as a data network, a wireless network, a telephony network, or any combination thereof. It is contemplated that the data network may be any local area network (LAN), metropolitan area network (MAN), wide area network (WAN), a public data network (e.g., the Internet), short range wireless communication network, or any other suitable packet-switched network, such as a commercially owned, proprietary packet-switched network, e.g., a proprietary cable or fiber-optic network, and the like, or any combination thereof. In addition, the wireless network may be, for example, a cellular network and may employ various technologies including enhanced data rates for global evolution (EDGE), general packet radio service (GPRS), global system for mobile communications (GSM), Internet protocol multimedia subsystem (IMS), universal mobile telecommunications system (UMTS), etc., as well as any other suitable wireless medium, e.g., worldwide interoperability for microwave access (WiMAX), Long Term Evolution (LTE) networks, code division multiple access (CDMA), wideband code division multiple access (WCDMA), wireless fidelity (Wi-Fi), wireless LAN (WLAN), Bluetooth®, Internet Protocol (IP) data casting, satellite, mobile ad-hoc network (MANET), and the like, or any combination thereof.
In one example, the real-time object tracking system 101 may be a platform with multiple interconnected components (e.g., a distributed framework). The real-time object tracking system 101 may include multiple servers, intelligent networking devices, computing devices, components, and corresponding software for real-time object tracking. In addition, it is noted that the real-time object tracking system 101 may be a separate entity of the system 100, a part of the one or more services, a part of the services platform, or included within devices, e.g., camera, UEs, IoT devices, or divided between any other components.
By way of example, the component of the real-time object tracking system 101 can communicate with each other and other components external to the system 101 using well known, new or still developing protocols. In this context, a protocol includes a set of rules defining how the network nodes within the communications network interact with each other based on information sent over the communication links. The protocols are effective at different layers of operation within each node, from generating and receiving physical signals of various types, to selecting a link for transferring those signals, to the format of information indicated by those signals, to identifying which software application executing on a computer system sends or receives the information. The conceptually different layers of protocols for exchanging information over a network are described in the Open Systems Interconnection (OSI) Reference Model.
Communications between the network nodes are typically effected by exchanging discrete packets of data. Each packet typically comprises (1) header information associated with a particular protocol, and (2) payload information that follows the header information and contains information that may be processed independently of that particular protocol. In some protocols, the packet includes (3) trailer information following the payload and indicating the end of the payload information. The header includes information such as the source of the packet, its destination, the length of the payload, and other properties used by the protocol. Often, the data in the payload for the particular protocol includes a header and payload for a different protocol associated with a different, higher layer of the OSI Reference Model. The header for a particular protocol typically indicates a type for the next protocol contained in its payload. The higher layer protocol is said to be encapsulated in the lower layer protocol. The headers included in a packet traversing multiple heterogeneous networks, such as the Internet, typically include a physical (layer 1) header, a data-link (layer 2) header, an internetwork (layer 3) header and a transport (layer 4) header, and various application (layer 5, layer 6 and layer 7) headers as defined by the OSI Reference Model.
The processes described herein for providing real-time object tracking using outer and inner tracking may be advantageously implemented via software, hardware (e.g., general processor, Digital Signal Processing (DSP) chip, an Application Specific Integrated Circuit (ASIC), Field Programmable Gate Arrays (FPGAs), etc.), firmware, circuitry, or a combination thereof. Such exemplary hardware for performing the described functions is detailed below.
The bus 3110 includes one or more parallel conductors of information so that information is transferred quickly among devices coupled to the bus 3110. One or more processors 3102 for processing information are coupled with the bus 3110.
One or more processors 3102 perform a set of operations on information as specified by one or more computer program code related to providing real-time object tracking using outer and inner tracking. The computer program code is a set of instructions or statements providing instructions for the operation of the processor and/or the computer system to perform specified functions. The code, for example, may be written in a computer programming language that is compiled into a native instruction set of the processor. The code may also be written directly using the native instruction set (e.g., machine language). The set of operations include bringing information in from the bus 3110 and placing information on the bus 3110. The set of operations also typically include comparing two or more units of information, shifting positions of units of information, and combining two or more units of information, such as by addition or multiplication or logical operations like OR, exclusive OR (XOR), and AND. Each operation of the set of operations that can be performed by the processor is represented to the processor by information called instructions, such as an operation code of one or more digits. A sequence of operations to be executed by the processor 3102, such as a sequence of operation codes, constitute processor instructions, also called computer system instructions or, simply, computer instructions. Processors may be implemented as mechanical, electrical, magnetic, optical, chemical or quantum components, among others, alone or in combination.
Computer system 3100 also includes one or more memories 3104 coupled to the bus 3110. The memory 3104, such as a random access memory (RAM) or other dynamic storage device, stores information including processor instructions for providing real-time object tracking using outer and inner tracking. Dynamic memory allows information stored therein to be changed by the computer system 3100. RAM allows a unit of information stored at a location called a memory address to be stored and retrieved independently of information at neighboring addresses. The memory 3104 is also used by the processor 3102 to store temporary values during execution of processor instructions. The computer system 3100 also includes one or more read only memories (ROM) 3106 or other static storage devices coupled to the bus 3110 for storing static information, including instructions, that is not changed by the computer system 3100. Some memory is composed of volatile storage that loses the information stored thereon when power is lost. Also coupled to the bus 3110 is one or more non-volatile (persistent) storage devices 3108, such as a magnetic disk, optical disk, or flash card, for storing information, including instructions, that persists even when the computer system 3100 is turned off or otherwise loses power.
Information, including instructions for providing real-time object tracking using outer and inner tracking, is provided to the bus 3110 for use by the processor from an external input device 3112, such as a keyboard containing alphanumeric keys operated by a human user, or a sensor. A sensor detects conditions in its vicinity and transforms those detections into physical expression compatible with the measurable phenomenon used to represent information in the computer system 3100. Other external devices coupled to the bus 3110, used primarily for interacting with humans, include a display device 3114, such as a cathode ray tube (CRT) or a liquid crystal display (LCD), or plasma screen or printer for presenting text or images, and a pointing device 3116, such as a mouse or a trackball or cursor direction keys, or motion sensor, for controlling a position of a small cursor image presented on the display 3114 and issuing commands associated with graphical elements presented on the display 3114. In various example embodiments, for example, in which the computer system 3100 performs all functions automatically without human input, one or more of external input device 3112, display device 3114 and pointing device 3116 is omitted.
In the various illustrated example embodiment, special purpose hardware, such as one or more application specific integrated circuits (ASIC) 3120, is coupled to the bus 3110. The special purpose hardware is configured to perform operations not performed by the processor 3102 quickly enough for special purposes. Examples of application specific ICs include graphics accelerator cards for generating images for the display 3114, cryptographic boards for encrypting and decrypting messages sent over a network, speech recognition, and interfaces to special external devices, such as robotic arms and medical scanning equipment that repeatedly perform some complex sequence of operations that are more efficiently implemented in hardware.
The computer system 3100 also includes one or more instances of a communications interface 3170 coupled to the bus 3110. The communication interface 3170 provides a one-way or two-way communication coupling to a variety of external devices that operate with their own processors, such as printers, scanners, and external disks. In general, the coupling is with a network link 3178 that is connected to a local network 3180 to which a variety of external devices with their own processors are connected. For example, the communication interface 3170 may be a parallel port or a serial port or a universal serial bus (USB) port on a personal computer. In some examples, the communications interface 3170 is an integrated services digital network (ISDN) card or a digital subscriber line (DSL) card or a telephone modem that provides an information communication connection to a corresponding type of telephone line. In some examples, the communication interface 3170 is a cable modem that converts signals on the bus 3110 into signals for a communication connection over a coaxial cable or into optical signals for a communication connection over a fiber optic cable. As another example, the communications interface 3170 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN, such as Ethernet. Wireless links may also be implemented. For the wireless links, the communications interface 3170 sends or receives or both sends and receives electrical, acoustic, or electromagnetic signals, including infrared and optical signals, that carry information streams, such as digital data. For example, in wireless handheld devices, such as mobile telephones like cell phones, the communications interface 3170 includes a radio band electromagnetic transmitter and receiver called a radio transceiver. In certain examples, the communications interface 3170 enables connection to a communication network for providing real-time object tracking using outer and inner tracking.
The term computer-readable medium is used herein to refer to any medium that participates in providing information to the processor 3102, including instructions for execution. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 3108. Volatile media include, for example, dynamic memory 3104. Transmission media include, for example, coaxial cables, copper wire, fiber optic cables, and carrier waves that travel through space without wires or cables, such as acoustic waves and electromagnetic waves, including radio, optical and infrared waves. Signals include man-made transient variations in amplitude, frequency, phase, polarization, or other physical properties transmitted through the transmission media. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, CDRW, DVD, any other optical medium, punch cards, paper tape, optical mark sheets, any other physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
The network link 3178 typically provides information communication using transmission media through one or more networks to other devices that use or process the information. For example, the network link 3178 may provide a connection through local network 3180 to a host computer 3182 or to an equipment 3184 operated by an Internet Service Provider (ISP). The ISP equipment 3184 in turn provides data communication services through the public, world-wide packet-switching communication network of networks now commonly referred to as the Internet 3190.
A computer called a server host or a server 3192 connected to the Internet hosts a process that provides a service in response to information received over the Internet. For example, the server host 3192 hosts a process that provides information representing video data for presentation at the display 3114. It is contemplated that the components of system can be deployed in various configurations within other computer systems, e.g., the host 3182 and the server 3192.
In one example, the chip set 3200 includes a communication mechanism such as a bus 3201 for passing information among the components of the chip set 3200. One or more processors 3203 have connectivity to the bus 3201 to execute instructions and process information stored in, for example, a memory 3205. The processor 3203 may include one or more processing cores with each core configured to perform independently. A multi-core processor enables multiprocessing within a single physical package. Examples of the multi-core processor include two, four, eight, or greater numbers of processing cores. Alternatively or in addition, the processor 3203 may include one or more microprocessors configured in tandem via the bus 3201 to enable independent execution of instructions, pipelining, and multithreading. The processor 3203 may also be accompanied with one or more specialized components to perform certain processing functions and tasks such as one or more digital signal processors (DSP) 3207, or one or more application-specific integrated circuits (ASIC) 3209. A DSP 3207 typically is configured to process real-world signals (e.g., sound) in real time independently of the processor 3203. Similarly, an ASIC 3209 can be configured to performed specialized functions not easily performed by a general purposed processor. Other specialized components to aid in performing the inventive functions described herein comprises one or more field programmable gate arrays (FPGA) (not shown), one or more controllers (not shown), or one or more other special-purpose computer chips.
The processor 3203 and accompanying components have connectivity to the memory 3205 via the bus 3201. The memory 3205 includes both dynamic memory (e.g., RAM, magnetic disk, writable optical disk, etc.) and static memory (e.g., ROM, CD-ROM, etc.) for storing executable instructions that when executed perform the inventive steps described herein to provide real-time object tracking using outer and inner tracking. The memory 3205 also stores the data associated with or generated by the execution of the inventive steps.
While the invention has been described in connection with a number of various embodiments and implementations, the invention is not so limited but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims. Although features of the invention are expressed in certain combinations among the claims, it is contemplated that these features can be arranged in any combination and order.
Number | Date | Country | Kind |
---|---|---|---|
20225099 | Feb 2022 | FI | national |